Transcription

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuideWA2393 Data Science for SolutionArchitectsClassroom Setup GuideWeb Age Solutions Inc.Copyright Web Age Solutions Inc.1

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuideTable of ContentsPart 1 - Class Setup.3Part 2 - Minimum Software Requirements for the Client Component.3Part 3 - Software Provided.3Part 4 - Instructions.4Part 5 - Organize Windows Explorer Folder Views .6Part 6 - Installing R Programming Language 3.3.1 on Windows .8Part 7 - Verify Installation.12Part 8 - Installing RStudio Desktop v0.99 on Windows.13Part 9 - Verify Installation.13Part 10 - Minimum Hardware Requirements for the Lab Server .14Part 11 - Minimum Software Requirements .15Part 12 - Software Provided.15Part 13 - Preparation.15Part 14 - Installing the VMWare image.15Part 15 - Running the VM .18Part 16 - Setting Up the IP Address of the Lab Server VM.20Part 17 - Summary.24Copyright Web Age Solutions Inc.2

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuidePart 1 - Class SetupThis class requires two components to be installed:1. The Client2. The Lab ServerThe Client and the Lab Server must be installed on different machines; the Lab Server must beaccessible by the Client.Both machines must have access to the Internet.The minimum software requirements for the Client and the Lab Server machines are different and arelisted in different sections of this document.Also, depending on the class software packaging option, you may have either one or two ZIP files.Part 2 - Minimum Software Requirements for the Client Component Windows OS: Windows Vista / 7. Latest Google Chrome browser InternetPart 3 - Software ProvidedList of ZIP files required for this course and used in next steps on this document: WA2393 REL 1 2.zip VM WA2341 CDH5-REL 2 2-Sep-2016.zipSend an email to [email protected] in order to obtain a copy of the software for thiscourse if you haven't receive it yet.All other software listed under Minimum Software Requirements is either commercially licensedsoftware that you must provide or software that is freely available off the Internet.Copyright Web Age Solutions Inc.3

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuidePart 4 - Instructions1. Make sure the account that you are using to install the software has administrative privileges andthe student using this machine will have the same rights.2. Extract the WA2393 REL 1 2.zip file to C:\3. Review that the following folders were created: C:\LabFiles\ C:\Software\4. Verify the following files were crated: C:\Software\RStudio\R\R-3.3.1-win.exe C:\Software\RStudio\R\RStudio-0.99.903.exe5. Download and install the latest Google Chrome browser 6. Create a shortcut to the Widows Command Prompt onto the desktop.7. Double click the Command Prompt shortcut to open the Command Prompt window.8. In the Command Prompt window, click the black icon in the top left-hand corner and selectProperties from the context menu.Copyright Web Age Solutions Inc.4

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuideThe Properties dialog opens.9. In the Properties dialog, check the Quick Edit Mode check box.Note: This option allows a user to copy and paste text in the command prompt using mouse actionsinstead of an edit menu.10. Click the Layout tab.11. In the Layout tab, enter 100 for Width property (for both Width text windows in the Layout tabwindow), 9999 for the Height of the Screen Buffer Size property, and 45 for the Height property of theWindow Size property.12. Click OK to close the Properties dialog.Copyright Web Age Solutions Inc.5

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup Guide13. If an Apply Properties to Shortcut dialog appears, select Modify shortcut that started thiswindow and click OK.Part 5 - Organize Windows Explorer Folder Views1. Start Windows ExplorerThe steps below may slightly vary depending on the Windows version (the steps below are shown forWindows 7). The main purpose of these steps is to enable the system-wide display of user filesextensions and hidden files in Windows Explorer.2. In the menu bar of Windows Explorer, click the Organize drop-down menu and select Folder andsearch options3. In the Folder Options dialog that opens, click the View tab4. In the View tab: Check the Display the full path in the title bar check boxCopyright Web Age Solutions Inc.6

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup Guide Select the Show hidden files, folders, and drives radio button Uncheck the Hide extensions for known file types check box5. Click the OK buttonCopyright Web Age Solutions Inc.7

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuidePart 6 - Installing R Programming Language 3.3.1 on Windows1. (Skip this step, if not applicable) If you are not yet logged in on the Student computer, log in asthe user who will be using this software in the class.2. From the C:\Software\RStudio\R directory, run R-3.3.1-win.exe3. If prompted with the Windows Security Warning, click Run.4. If prompted with the Windows system User Account Control dialog, click Yes.5. Accept English as the Setup Language and click OK.6. In the Welcome screen that opens, click Next 7. In the License Dialog, click Next 8. In the Destination Location dialog, enter c:\Software\R for the folder location and click Next 9. In the Select Components dialog, select the 32-bit User Installation option from the drop-downbox.Copyright Web Age Solutions Inc.8

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup Guide10. Accept default preselected options for Core Files and 32-bit Files checkboxes.11. Click Next 12. In the Startup options dialog, select Yes (customized startup).13. Click Next 14. In the Display Mode dialog, accept the default MDI option and click Next Copyright Web Age Solutions Inc.9

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup Guide15. In the Help Style dialog, accept the default HTML help option and click Next 16. If prompted, in the Internet Access dialog, select Internet2 option.17. Click Next 18. In the Select Start Menu Folder dialog, accept the default R name and click Next 19. In the Select Additional Tasks dialog, accept defaults and click Next Copyright Web Age Solutions Inc.10

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuideInstallation process begins.Wait for the installation process to complete.When installation is complete, you will be presented with the confirmation dialog.20. Click Finish.Copyright Web Age Solutions Inc.11

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuidePart 7 - Verify Installation1. Find the R short-cut created on the desktop and double click it.The R GUI console should open.2. Type q() in console and press Enter.3. In the Question dialog that opens, click No.Installation and verification steps are complete.Copyright Web Age Solutions Inc.12

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuidePart 8 - Installing RStudio Desktop v0.99 on WindowsNote: The prerequisite for installing this package is the presence of the R package version 2.11.1 (orhigher) on the target system (as per R-3.3.1-win-32bit.odt document).1. On the Student computer, log in as the user who will be using this software in the class2. From the C:\Software\RStudio\R directory, run RStudio-0.99.903.exe3. If prompted with the Windows system User Account Control dialog, click Yes4. On the RStudio Setup Welcome Screen, click Next 5. In the Choose Install Location dialog, enter c:\Software\R for the destination folder and clickNext 6. In the Choose Start Menu Folder dialog, accept defaults and click InstallInstallation process begins.Wait for the installation process to complete.When installation is complete, you will be presented with the confirmation dialog.7. Click FinishPart 9 - Verify Installation1. Create a Desktop shortcut pointing to the C:\Software\R\bin\rstudio.exe folderCopyright Web Age Solutions Inc.13

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup Guide2. Click the RStudio Desktop short-cutThe RStudio IDE opens.3. In the Console window on the left hand side, type in q() and press EnterRStudio closes.Installation and verification steps are complete.Part 10 - Minimum Hardware Requirements for the Lab ServerThe Lab server is a 64-bit VM that requires a 64-bit host OS and a virtualization product that cansupport a 64-bit guest OS. This VM uses 6 GB of total RAM and 2 vCPU. The total system memoryrequired varies depending on the size of data sets used in labs and on the other processes that arerunning in the VM. 8 GB RAM 80 GB Hard DiskCopyright Web Age Solutions Inc.14

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuidePart 11 - Minimum Software Requirements Windows XP / Vista / 7 - 64 bit VMware player 6.x or higher ChromePart 12 - Software ProvidedYou will receive the following file (further referred to as the VM ZIP file) containing the VMwareplayer-compatible virtual machine: VM WA2341 CDH5-REL 2 2-Sep-2016.zipPart 13 - Preparation1. Extract the VM WA2341 CDH5-REL 2 2-Sep-2016.zip file to C:\Note: Every student in the class will need a dedicated Lab Server. So the class setup will require asmany Lab Servers as there are students in the class. In other words, you will need to perform thissetup as many times.It is recommended to have each Lab Server VM installed on a separate physical machine, althoughthey can be collocated as long as their Network Connectivity is setup with the Bridged option (seedetails further in the document).Part 14 - Installing the VMWare image1. Open a file browser and navigate inside the unzipped VM ZIP folder. Locate the VMware playerexecutable file vmplayer.exe.Note. If you don't find the VMware player executable file in this folder, download the VMware player6.x or higher from the VMware website using the following link:http://www.vmware.com2. Install the VMware player accepting all the defaults during the installation.Copyright Web Age Solutions Inc.15

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup Guide3. Restart the computer.1. From the Start menu, select All Programs VMware VMware Player.2. If prompted to download a new version of VMware player decline the update.3. Press Ctrl-O.The Open Virtual Machine dialog opens4. Locate and select the cloudera-quickstart-vm-5.4.0-0-vmware.vmx file located under theunzipped VM ZIP folder and click Open.The cloudera-quickstart-vm-5.4.0-0-vmware menu option will appear on the list of available virtualmachines.5. Click Edit virtual machine settings at the bottom of the VMWare PlayerCopyright Web Age Solutions Inc.16

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuideThe Virtual Machine Settings screen opens with the Hardware tab opened by default.6. Change the Memory VM size attribute to 6 G (6144 M)7. Change the Number of Processors VM attribute from 1 to 28. The Network Adapter VM attribute can be configured for the Bridged or NAT connectionoptions. As a rule of thumb, use NAT for the VM being installed locally on the physical studentmachine, use Bridged on remote machines. If these suggestions do not work, use theoptions that best suite your environment. The NAT options is preselected by default; for the Bridged option, see the Setting Upthe IP Address of the Lab Server VM lab step at the end of the document9. Click CD/DVD (IDE) Device10. Uncheck the Connect at power on (or keep it clear if it is already unchecked) Device statusCopyright Web Age Solutions Inc.17

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup Guide11. If Floppy is present, click Floppy (if this option is not present, skip to the next numbered step) Uncheck the Connect at power on (or keep it clear if it is already unchecked)12. If Sound Card is present, click Sound Card (if this option is not present, skip to the next to thenext numbered step) Uncheck the Connect at power on (or keep it clear if it is already unchecked)13. If Printer is present, click Printer (if this option is not present, skip to the next to the nextnumbered step) Uncheck the Connect at power on (or keep it clear if it is already unchecked)14. Click OK at the bottom of the Virtual Machine Settings Screen to close it.Part 15 - Running the VM1. Select the cloudera-quickstart-vm-5.4.0-0-vmware virtual machine (it should already be preselected) and click Play virtual machine.Copyright Web Age Solutions Inc.18

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup Guide2. Click "I moved it", if prompted.3. If you are promoted to download and install the VMware Tools for Linux, accepted the option.Accept reasonable options if and when they appear.VM bootstrapping may take some time, and when it completes, you should be automatically logged inthe Lab Server VM as the cloudera user and presented with the Cloudera Desktop.Copyright Web Age Solutions Inc.19

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuideThe installation of the Lab Server virtual machine is completed.The last Lab setup step is required if you want to set up the student VMs with the Bridged networkconfiguration option.Note: The remote (SSH) access to the Lab Server VM is done under the cloudera username withcloudera password.The cloudera account has sudo privileges in the Lab Server. The root account password is clouderaPart 16 - Setting Up the IP Address of the Lab Server VMIf you setup your VM Network Adapter with the Bridged option as shown in the screen-shoot below, bydefault, you will have a DHCP leased IP address assigned to the Lab Server. It may be a convenientfeature from the administration perspective, but will affect student SSH connections during the class asthey will always be required to change the Lab Server IP address whenever the IP address of the LabServer changes (IP lease may be configured to expire every day and the class runs for four days).Copyright Web Age Solutions Inc.20

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuideConsidering the inconvenience to the students, it may be worthwhile to assign each Lab Server aunique IP address.1. From the Lab Server toolbar, select System Preferences Network Connections.The Network Connections Dialog opens.2. Select Wired / Auto eth1 and click Edit .The Editing Auto eth1 Dialog opens.3. Select the IPv4 Settings Tab.Copyright Web Age Solutions Inc.21

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuideIn the screen-shoot above the network adapter is configured to receive IP address from the DHCPserver.4. For setting up the static IP address, select Manual from the Method: drop-down.5. Click Add to add an IP Address, Netmask, Gateway and DNS as per your network settings.Copyright Web Age Solutions Inc.22

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuideSample input screen is shown below.6. Click Apply .The Authenticate Dialog opens up.7. In the Password for root: text window, enter cloudera and click Authenticate.You should be returned to the Network Connections Dialog.8. Click CloseThis is the final step of the Lab Server setup.You have successfully installed the software for this course.Copyright Web Age Solutions Inc.23

WA2393 Applied Data Science and Big Data Analytics - Classroom Setup GuidePart 17 - SummaryYou have successfully installed the software for this course!If you have any question please contact us by email at [email protected] US and Canada call: 1-877-812-8887 ext. 26International call: 416-406-3994 ext. 26Copyright Web Age Solutions Inc.24

_1. From the Start menu, select All Programs VMware VMware Player. _2. If prompted to download a new version of VMware player decline the update. _3. Press Ctrl-O. The Open Virtual Machine dialog opens _4. Locate and select the cloudera-quickstart-vm-5.4.0-0-vmware.vmx file