Transcription

IntroductionRdweb systemExamples of executionInstalling RdwebWeb Interface to R for High-PerformanceComputingJunji NAKANO††Ei-ji NAKAMA‡The Institute of Statistical Mathematics‡, JapanCOM-ONE Ltd., JapanThe R User Conference 2009July 8-10, Agrocampus-Ouest, Rennes, FranceConcluding remarks

IntroductionRdweb system1Introduction2Rdweb system3Examples of execution4Installing Rdweb5Concluding remarksExamples of executionInstalling RdwebConcluding remarks

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksR and requirement for huge calculationR: a free software environment for statistical computing andgraphics forstatisticians to implement new statistical methodspractitioners to analyze real data sets in various fieldsRecently, both users require huge amount of calculation for theirown purposesParallel computingis a practical method for realizing huge calculationby executing calculations on several computers and/or many CPUcores at the same time

IntroductionRdweb systemExamples of executionInstalling RdwebParallel computing techniques on RParallel BLAS (Basic Linear Algebra Subprograms) using threadsATLASFree parallel and optimized BLASGotoBLASFastest parallel and optimized BLASIntel MKL, AMD ACMLParallel and optimized BLAS provided by vendersMPI type libraries for R using clustered computersRpvman R interface to PVM (Parallel Virtual Machine)Rmpian R interface to MPI (Message Passing Interface)snow (Simple Network of Workstations)A package for realizing parallel computing by parallel apply functionsUsing lower level parallel libraries such as Socket, MPI, PVM, nwsfor transferring data among processesAs it conceals difference of lower level libraries, it is easy to use forparallel computing.multicoreRunning parallel computations in R on machines with multiple coresor CPUs.Concluding remarks

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksExisting Web environments for RRwebA Web based interface to R for submitting the codeRpadA workbook-style user interface to R through a Web browserrapacheEmbedding R in the Apache Web serverRserveTCP/IP server that allows other programs to use facilities of RRWebServicesExposing R functions as Web services through Java/Axis/Apache.Parallel computing is not the main concern of these programs.

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksSupercomputers in ISMWe have three supercomputer systems in the Institute of StatisticalMathematics (ISM), Japan. (We will replace them next year.)Present supercomputers provide parallel computing facilities.We use R on our supercomputers.

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksOur problemsTroublesEach supercomputer uses different(Unix-like) environment.Unix-like environments are not easy touse for novices.Several parameters for parallelcomputing need to be specifieddifferently for each supercomputer.

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksOur solutionApproach: Web interfaceWe have made “Rdweb”, a Web interfaceto R for using parallel computing functionsin R R script edit file transfer job resource management

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksStructure of RdwebRdweb (R daemon for Web) system consists of threecomponents:Web interface (via Web browser on user’s computer)It is rather simple and programmed by HTML andJavaScript.JavaScript is used to assist users’ input slightly.Web server (on Rdweb gateway computer)It is a CGI program for authentication, file transfer, jobcontrol (start, stop and check), creation of JCL(JobControl Language) script and scattering the programto remote computers as a client of RdaemonRdaemon (on the front-end computer of clustersystem)It checks authentication, transfers required files, startsand ends jobs, and shows the status.clientbrowserdata fileR programauthorizationnumber of snow Slaveparallel number of BLASHTTPWeb serverHTTP serverCGI made of perljob controluser interfaceTCP/10024RdaemonNISorPAMorCRYPTR MasterRSlaveRSlaveRSlavebatch systemR machineRSlave

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksCharacteristics of RdwebRdweb is designed for supercomputers and personal PC clustersystems.Above stated three components of Rdweb and R slaves can reside ondifferent or same computers.Text-based Web browsers can be used (with a little limitation).

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksRdweb on supercomputers in ISMShared-MemoryDistributed-MemoryWeb ServerWeb ServerSGI ALTIX 350ApacheHP XC4000 ClusterApacheHP MPIfront endCGITCP:10024front endRdaemonCGITCP:10024Rdaemonnode 1SGI ALTIX 3700R MasterLAM MPIR Slavenode 4node 3node 2node 2R Slave 03R Slave 04R Slave .R Slave 60R Slave 63R SlaveR Slavenode 127R SlaveR SlaveR Slave 61R Slave 62R Slavenode .node 128HP XC4000 ClusterOpteron252 2CPU / node2GB or 4GB memory / nodetotal 128 nodesR SlaveR SlaveLFS SLURMR Slave 02OpenPBSR Slave 01Physical random number serverPhysical random number serverSGI ALTIX front end Itanium2 8CPU32GB memory back end Itanium2 64CPU / node512GB memory / nodetotal 4 nodesR Slavenode 1R Master

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksDifferences between Rweb and RdwebFrom the user side, Rdweb is similar to Rweb.Rdweb can control system resources such as user, CPU, memory andqueue. Although Rweb does not allow the use of “system” commandfrom the security reason, Rdweb does not have such limitation becauseRdweb has rigid authentication mechanism.Rweb and RdwebAuthenticationFile uploadControl of parallel BLASControl of snowRwebnoneone fileimpossibleimpossibleRdwebPAM, NIS or Unix passwardA lot of filesEach sessionEach session

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksAuthentication of Rdweb (1) - Web serverRdweb adopts two authentication stages. First stage utilizes Web serverauthentication mechanism when the user is connected to the Web serveron the gateway computer. The mechanism is realized by mod auth pamof Apache.sites-enabled Directory ‘‘/www/’’ Options .AllowOverride NoneOrder allow,denyAllow from allAuthPAM Enabled onAuthType BasicAuthName "Rdweb User Login"Require valid-user /Directory

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksAuthentication of Rdweb (2) - RdaemonAs second stage of Rdweb authentication, Rdaemon utilizesauthentication methods such as PAM (recommended), NIS and Unixpassword. We can select one of them when we compile Rdweb system.Cookie must be enabled in the Web browser for Web interface of Rdweb.

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksPAM authenticationWeb ServerbrowserPAM (Pluggable AuthenticationModules) is the API forauthentication used in Linux,Solaris, MacOSX and AIX (5.3 nPAM APIPAM uses NIS or LDAP or Unixpassword.If PAM is not available, NIS orUnix password can be directly usedfor authentication in Rdaemon.HTTPSpam.confPAM LibraryPAM Service ModulesLDAPNISCluster SystemUnixpassword

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksLocation of files“Rdweb” directory is created in the home directory on the front-end.Directory for execution is /Rdweb/Uploaded files are also stored in /Rdweb/Logs and scripts are stored in /Rdweb/YYYYMMDD hhmmss/where YYYYMMDD hhmmss shows year, month, day, hour, minuteand second, according to the ISO-8601 date format.

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksUploading filesTo upload data and/or program files, we click “Choose” button,select a file, and click “upload” button.These operations can be repeated without affecting edited script andother functions.SCP or SFTP clients such as Filezilla client are recommended foruploading large files because HTTP upload sometimes causestimeout and stops.

IntroductionRdweb systemExamples of executionInstalling RdwebPreparing data and programBy using a text editor, we prepare the following data file.HW.csvheight,weight1.70,651.85,801.75,86Save this file as “HW.csv”.We also prepare R programBMI.RBMI -function(H,W){W/H 2}and save it as “BMI.R”.Concluding remarks

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksInputUpload two files “HW.csv” and “BMI.R”. Then input the following Rprograminput text areaHW -read.csv("HW.csv")source("BMI.R")HWB - cbind(HW,BMI BMI(HW height,HW weight))HWBplot(HWB)in the editor area of Web interface which is connected to Rdweb gateway.

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksExecutionJob is started by clicking “Execute” button. Job status is shown in “JOBInformation”.Job information is refreshed by clicking “Refresh” button or top title.Results of calculation are stored as files with extensions .Rout (textformat) and .pdf (pdf Graphics).

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksUse of snowUsually in R, we have to specify the number of processes differentlyaccording to the cluster type.makeCluster normal# SOCK clustercl - makeCluster(c("hostname1","hostname2"))# MPI cluster with 2 slave processescl - makeCluster(2)We add new function “setDefaultClusterOptions” to use parametersgiven in the Web interface in the same way for all cluster types.makeCluster Rdwebcl - makeCluster(getClusterOption("spec"))

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksSelection of parameters for parallel computingWe need to select queue, number of slave processes, number of threads ofparallel BLAS, and cluster type by using pull-down menus in this order.

IntroductionRdweb systemExamples of executionInstalling RdwebExecutionJob is started by clicking “Execute” button.Creation of new result files is shown by clicking “Refresh” button.Concluding remarks

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksBatch systemRdweb requires a batch system. Several batch systems are available.at, batchStandard batch system of Unix specified in XPG4 (X/Openportability guide Ver.4). It has simple queue mechanism.OpenPBS (NASA etc.)Queuing and scheduling control system for cluster systems.Development stopped in 1998.Torque (Cluster Resource Inc.)Free system based on Open PBSLoad Leveler(IBM)Batch system by a venderLSF (Platform Computing Inc.)Commercial job controlling toolSLURMFree resource control utility

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksPlatformsRdweb should work on almost all Unix-like OSs.We have checked the following systems in ISM and AIXMacOSXBATCH SYSTEMLSF slurmTorqueTorqueOpenPBSatatLoadLeveleratNote: Installation of these batch systems is sometimes complicated.

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksInstallationWe keep source codes of Rdweb athttp://prs.ism.ac.jp/ nakama/rdweb/Required installation procedurePrepare the skeleton of the shell file to a front-endDefine the system information on Web serverThey depend heavily on the cluster system. Details of the settinginformation can be seen in “README” file in Rdweb archive.We put required packages for Debian GNU/Linux (Lenny) athttp://prs.ism.ac.jp/ nakama/debian/lenny-ism/. Theyinclude helper packages for GotoBLAS, Torque, and packages oflam-mpi and openmpi for Torque. (Unfortunately, these are stillbuggy.).

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksExamples in ISM (1)Web ServerWeb ServerSGI ALTIX 350ApacheHitachi SR11000node1ApacheLAM MPICGITCP:10024RdaemonCGITCP:10024Rdaeminfront endR MasterR SlaveSGI ALTIX 3700LAM MPInode 2node 4LAM MPInode 3R Slave 03R Slave 04R Slave .OpenPBSR Slave 02Physical random number serverPhysical random number serverSGI ALTIX front end Itanium2 8CPU32GB memory back end Itanium2 64CPU / node512GB memory / nodetotal 4 nodesR MasterR Slave 01R Slave 60R Slave 61R Slave 62R Slave 63Hitachi SR11000Power4 16CPU / node32GB memory / nodetotal 4 nodesR Slavenode 3LAM MPIR MasterR Slavenode 4LAM MPIR MasterR SlaveLoadLevelerR Masternode 2node 1

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksExamples in ISM (2)Web ServerHP XC4000 ClusterWeb ServerApacheHP MPIApacheVXPRO R1400LAM MPIRdaemonCGITCP:10024node 1R Slave 08R Slavenode 2node 2R Slave 09R Slavenode 127R SlaveLFS SLURMPhysical random number serverR SlavePhysical random number serverR Slavenode .R MasterR Slave 01R Slave .R MasterR Slavenode1R SlaveR Slave .R Slave 16Node 3R Slave 17R Slave .R Slave 24node 4R Slave 25node 128HP XC4000 ClusterOpteron252 2CPU / node2GB or 4GB memory / nodetotal 128 nodesR SlaveR SlaveVXPRO R1400R Slave .XEON E5430 8CPU / node16GB memory 2 nodes32GB memory 2 nodesR Slave 32TorqueTCP:10024Rdaemonfront endCGI

IntroductionRdweb systemExamples of executionInstalling RdwebConcluding remarksConcluding remarksAdvantages of RdwebNovices can use parallel execution functions ofwith less efforts.Number of parallel execution can be specified easily for parallelBLAS and snow.Secure authentication is available by PAM which can use LDAP orNIS.Disadvantages of present RdwebSystem installation is complicatedand completely platform dependentFuture workEncrypting communication between Web server and RdaemonPorting to various RR with many BLASsR compiled by several compilersR on many OSs

z COM-ONE Ltd., Japan The R User Conference 2009 July 8-10, Agrocampus-Ouest, Rennes, France. . SCP or SFTP clients such as Filezilla client are recommended for . MPI OS BATCH SYSTEM HP-MPI Linux LSF slurm LAM-MPI Linux Torque OpenMPI Linux Torque