This is an old revision of the document!
Table of Contents
How Do I ?
This page include answers to many of the commonly asked questions about the cluster. Please send any questions you may have to deadline@basement-supercomputing.com For a faster response use DSL in the subject line. 
Transfer Files To/From the Cluster
(14-Feb-2022)
There are several ways to transfer files to and from your account on the cluster. If the files are large (truly Big Data) contact the system administrator.
There are two basic ways both require working from the command line
Transfer From a Cloud Account
A personal cloud account(e.g. Google drive or Microsoft OneDrive) can be accessed from your cluster account using the rclone command. This command will allow transfer to and from the cloud into your local cluster account. The configuration and use is done in text mode using a terminal window from any the DSL Lab workstations.
See the Using rclone page for more information.
Transfer From Your PC or Laptop
The easiest way to transfer files is scp (secure copy) command. If you are using Windows, you will need an “ssh/scp client.” Either of these “ssh clients” will work. They are both freely available at no cost. (MobaXterm is recommended)
- Putty (provides terminal for ssh session)
 
- MobaXterm (provides terminal for ssh sessions and allows remote X Windows session)
- Windows PowerShell
These tools will allow you to transfer data files using the ``scp`` command.
NOTE: You will need to be inside the Saint Peters network to use scp (outside access requires logging into the campus VPN)
Mac and Linux systems have scp already installed. 
If you have an account on the cluster, you will have a directory called /home/YOUR_USER_NAME, where YOUR_USER_NAME is your login name. To transfer files from your local system (laptop) to the cluster use the following command (from putty, a PowerShell or MobaXterm  terminal window) The ``172.17.105.10`` is the local IP address of the cluster. 
scp LOCAL_FILE_NAME YOUR_USER_NAME@172.17.105.10:/home/YOUR_USER_NAME
You will be asked for your password (because it is a secure encrypted copy) Note: The Windows based Putty package uses the pscp command that works that same as scp.
Transferring files from the cluster to your local system works the same way, only the source and destination are swapped. Again your password is required.
scp YOUR_USER_NAME@172.17.105.10:/home/YOUR_USER_NAME/LOCAL_FILE_NAME LOCAL_FILE_NAME
Using Python
03-Mar-2022
There are two versions of Python installed on the cluster.
- Python 2.7.5
 This version is the default version that is part of the Operating system distribution. To use this version there is no input needed. To confirm what version you are using enter:
$ python -V Python 2.7.5
- Python 3.7.4
 This version is part of Anaconda distribution (Version 4.7.12). Anaconda provides a complete Python environment that is separate from the default version mentioned above. To use Anaconda enter:
use_conda
To stop using the Anaconda environment enter:
stop_conda
The following example demonstrates this process, note how the version of Python changes from the default, to the latest version in Anaconda.
$ use_conda $ python -V Python 3.7.4 $ stop_conda $ python -V Python 2.7.5
Anaconda Navigator
To use Anaconda Navigator, open a terminal window and type:
$ use_conda $ anaconda-navigator
Contact the system administrator for options.
Run Anaconda Navigator on Your Laptop
You will need to be on the campus wireless for this to work. It will not work from the Internet.
1. Log into your account using on the cluster using MobaXterm ssh on MobaXterm
2. run the following:
jupyter-notebook --generate-config
3. Edit the file (use vi or nano):
nano .jupyter/jupyter_notebook_config.py
4. Change this line:
#c.NotebookApp.ip = 'localhost'
5. to the following (note remove “#”) and exit:
c.NotebookApp.ip = '127.0.0.1'
6. Now run from MobaXterm
anaconda-navigator
Python Libraries
To check the libraries for a given Python version/environment, use the following from either the default or Anaconda environment:
pip list
If you need a library installed contact deadline@basement-supercomputing.com For a faster response, please use DSL in the subject line. 
Using Spark
There are two versions of Spark on the cluster. Each version includes PySpark and SparkR. When Spark programs are run, they are run using the cluster (not just on the machine you are using)
NOTE 1: If you are logged into limulus7-TF, limulus8-TF, limulus9-TF, you will need to use one of the other machines for Spark. To do this from the *-TF machines, simply use ssh and login to limulus2-6. For example, if user “basement” is logged into limulus7-TF they can use limulus3 (or 2,4,5,6) for Spark. All user files are available on all machines.
[basement@limulus7-TF ~]$ ssh limulus3 ssh limulus3 Last login: Fri Sep 28 15:14:31 2018 from n23 [basement@limulus3 ~]$
NOTE2: Spark can be rather chatty with INFO messages (particularly V1). To turn down the messages enter the following at the Spark prompt:
sc.setLogLevel("WARN")
- Spark 2.1.0
 This version of Spark is the default. If you enter,spark-submit,spark-shell,pyspark, ofsparkRthis version will be used. Pyspark uses Python version 3.6.3
- Spark 1.6.3
 To use version 1 of Spark, enter the following at the command line prompt$ export SPARK_MAJOR_VERSION=1 
 After this command is entered, if you enter,spark-submit,spark-shell,pyspark, ofsparkRSpark version 1 will be used.
Using the Zeppelin Web Notebook
28-Jan-2022
The Zeppelin Web Notebook is a great way to create, copy, and share data science projects. The DSL Zeppelin Notebook is available by clicking on https://dsl.saintpeters.edu:8443. The Zeppelin Web Notebook has the following features:
- Available from the web (you must have a valid account on the DSL cluster)
- A Python 3.7.4 Interpreter
- A Spark 2.3.2 Interpreter
- A PySpark 2.3.2/3.74 interpreter
- A Shell (bash) interpreter
- A Hadoop Hive (SQL database) interpreter
The best way to learn about Zeppelin is to open a session (https://dsl.saintpeters.edu:8443) and take a look at the Notebook called Basic Tests (Python, PySpark, sh, and Hive)
Once you connect to Zeppelin you will see a web page that looks similar to the following (click to enlarge):
If you have a functional account, you can login (upper right corner) and explore the Zeppelin Notebook. Once you are logged in, your browser should look similar to the image below (the listed notebooks will be different).
Click on the Basic Tests (Python, PySpark, sh, and Hive) notebook, clone a copy of the notebook so that you can change it and you can try the paragraphs that illustrate the Zeppelin features.
For more information on using Zeppelin, see this page for Zeppelin Documentation
Please note: Users can control read and write access to their notebooks. Notebooks can be imported and exported and have multiple users.
R Studio
21-Jan-2022
To start R Studio on any of the lab machines enter the following into you browser
Note: RStudio is only available locally on each of the machines in the lab. There is no internet access to RStudio.
R Libraries
Because the number and variety of R libraries is quite vast, each user can install and manage the libraries they need by using the simple method described below.
Libraries will be installed in /home/YOUR_USER_NAME/R-libs
- Login to the cluster, cut and paste this command into terminal window:echo "R_LIBS_USER=${HOME}/R-libs" > ~/.Renviron
- When installing libraries, say “y” to the install a personal library. The path should be/home/YOUR_USER_NAME/R-libs 
In some case the newer development environment may be needed for libraries to build. Enter this command before starting R
scl enable devtoolset-6 bash
Use TensorFlow
Tensorflow is available as part of Anaconda Python. The easiest (and recommended) way to use Tensorflow with Python through the high level Keras library.
It is also possible to use low level Tensorflow directly.
Note: Python Keras and Tensorflow can be used on any workstation, Only three workstations have GPUs (limulus7-tf, limulus8-tf, limulus9-tf), however both Keras and Tensorflow will run without using a GPU (applications will run slower).
 
 
