User Tools

Site Tools


start

This is an old revision of the document!


Welcome to the Data Science Lab Cluster

DSL LAB IS OPEN

You will need an account to get started (contact the Lab Admin). This wiki will be updated as new capabilities are added.

  • Python Anaconda is installed
  • RStudio is installed
  • Zeppelin Notebooks are now full configured for Python, PySpark, Spark, Hive, and shell programming. A notebook called Basic Tests (Python, PySpark, sh, and Hive) is available for learning more about Zeppelin (clone first).
  • Transferring Files from the Cloud has been added. The rclone package has been installed on all workstations (rclone is a command line tool)
  • Python Tensorflow (CPU and GPU) and Keras are installed (GPU for Tensorflow systems)

Watch this space for updates.

About The System

This computation resource is a cluster of workstations that can work together as one big systems. The system can run large Hadoop and Spark jobs using the 10 TByte Hadoop Distributed File System (HDFS) and up to 120 cores. There are also three GPU equipped nodes that will be configured to run TensorFlow.

HINT: To get back to this main page from any page in the wiki, click on the Data Science Lab in the upper left corner.

System News:

Feb-18-2022  Python Tensorflow (CPU and GPU) and Keras installed
Feb-14-2022  Zeppelin Notebooks are configured and rclone installed
Feb-07-2022  Anaconda Navigator
Jan-21-2022  Anaconda Python and RStudio installed
Jan-20-2022  System is ready for users
Nov-11-2012  Upgrade to CentOS 7 in progress
---- OLD SYSTEM ----
Aug-30-2019  Python options are now (default, V-2.6.6) or Ananaconda (V-3.7.1). Zeppelin now supports Python3,
             PySpark, Spark1, Spark2, and SparkR. See the "How Do I" page for information.
Feb-20-2019  Python Anaconda is now available, see the "How Do I" page for information on how to access it.
Nov-27-2018  Default Spark version is now 2.1.0, default Pyspark uses Python 3.6.3
Nov-07-2018: The Zeppelin Notebook is now available, see System Access above
Nov-01-2018: Python 3.6 updated on all systems with modules: numpy matplotlib TextBlob scipy 
             scikit-learn gensim pillow h5py xgboost happtbase mysqlclient happybase 
             (See "How Do I" for usage information) 
Jul-26-2018  R Studio server is installed. Enter  "http://localhost:8787" in a browser to access.
May-02-2018  R Libraries: See /opt/share/doc/Installing-R-Libraries for how to install your own R libraries.
Apr-20-2018: Python HBase lib HappyBase installed. Tensorflow now running on Limulus8-TF and Limulus9-TF
Feb-21-2018: A current Wikipedia snapshot is in HDFS at /data/Wikipedia
Feb-19-2018: HDFS is now available on all limulus machines as /mnt/hdfs
             Annotated examples from the purple Hadoop book are in /opt/share/doc/Hadoop2_Quick_Start_V1
Feb-14-2018: The following Python 2.7 modules are installed: nltk, keras, numpy, pandas, matplotlib
             TextBlob, scipy, Tensorflow, scikit-learn, gensim, pillow, h5py 
             !!! Be sure to run "scl enable devtoolset-6 python27 bash" to use Python 2.7             
start.1645218710.txt.gz · Last modified: 2022/02/18 21:11 by deadline

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki