User Tools

Site Tools


start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
start [2021/11/17 21:38] deadlinestart [2022/03/31 15:51] (current) deadline
Line 1: Line 1:
 =====Welcome to the Data Science Lab Cluster===== =====Welcome to the Data Science Lab Cluster=====
  
-===== UPGRADE IN PROCESS =====+===== DSL LAB IS OPEN =====
  
-This computation resource is a cluster of workstations that can work together as one big systemsCurrently, the system can run large Hadoop and Spark jobs. There are also three GPU equipped nodes that are configured to run TensorFlow.+You will need an account to get started (contact the Lab Admin). This wiki will be updated as new capabilities are added. 
 + 
 +  * [[how_do_i#using_python|Python Anaconda]] is installed 
 +  * [[how_do_i#r_studio|RStudio]] is installed 
 +  * [[how_do_i#using_the_zeppelin_web_notebook|Zeppelin Notebooks]] are now full configured for Python, PySpark, Spark, Hive, and shell programming. A notebook called  **Basic Tests (Python, PySpark, sh, and Hive)** is available for learning more about Zeppelin (clone first). 
 +  * [[how_do_i#transfer_files_to_from_the_cluster|Transferring Files from the Cloud]] has been added. The [[using_rclone|rclone]] package has been installed on all workstations (rclone is a command line tool) 
 +  * [[how_do_i#use_tensorflow|Python Tensorflow]] (CPU and GPU) and Keras are installed  
 +**Watch this space for updates.** 
 + 
 +====About The System==== 
 + 
 +This computation resource is a collection of nine individual workstations that can work together as a scalable data science cluster for Big Data processingThe system can run large Hadoop and Spark jobs using the 10 TByte Hadoop Distributed File System (HDFS) on up to 120 cores. There are also three GPU equipped nodes that are configured to run TensorFlow. Total system memory is 600 GBytes spread across 
 +30 separate motherboards.  
 + 
 +Each workstation provides a Linux desktop environment that supports Anaconda Navigator (Python), Rstudio, and the Zeppelin web notebook (Spark, PySpark, Hadoop Hive,HBase, Python)
  
 ====FOR HELP CLICK ON THE "How Do I" LINK BELOW==== ====FOR HELP CLICK ON THE "How Do I" LINK BELOW====
Line 15: Line 29:
  
 **System News:** **System News:**
 +  Feb-18-2022  Python Tensorflow (CPU and GPU) and Keras installed
 +  Feb-14-2022  Zeppelin Notebooks are configured and rclone installed
 +  Feb-07-2022  Anaconda Navigator
 +  Jan-21-2022  Anaconda Python and RStudio installed
 +  Jan-20-2022  System is ready for users
   Nov-11-2012  Upgrade to CentOS 7 in progress   Nov-11-2012  Upgrade to CentOS 7 in progress
 +  ---- OLD SYSTEM ----
   Aug-30-2019  Python options are now (default, V-2.6.6) or Ananaconda (V-3.7.1). Zeppelin now supports Python3,   Aug-30-2019  Python options are now (default, V-2.6.6) or Ananaconda (V-3.7.1). Zeppelin now supports Python3,
                PySpark, Spark1, Spark2, and SparkR. See the "How Do I" page for information.                PySpark, Spark1, Spark2, and SparkR. See the "How Do I" page for information.
Line 33: Line 53:
                TextBlob, scipy, Tensorflow, scikit-learn, gensim, pillow, h5py                 TextBlob, scipy, Tensorflow, scikit-learn, gensim, pillow, h5py 
                !!! Be sure to run "scl enable devtoolset-6 python27 bash" to use Python 2.7                             !!! Be sure to run "scl enable devtoolset-6 python27 bash" to use Python 2.7             
-  Feb-09-2018: All external ssh connections will close after 30 minutes of inactivity. Internal + 
-               ssh connections (machine to machine) will close after 1 hour of inactivity.+
  
start.1637185134.txt.gz · Last modified: 2021/11/17 21:38 by deadline

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki