How Do I ?

This page include answers to many of the commonly asked questions about the cluster. Please send any questions you may have to deadline@basement-supercomputing.com For a faster response use DSL in the subject line.

Request Library or Application

As a valid user with a login account, you may request libraries (i.e. Python libraries) or an application to be installed on the DSL cluster. Keep in mind not all libraries/applications can be installed due to compatibility issues.

Transfer Files To/From the Cluster

(14-Feb-2022)

There are several ways to transfer files to and from your account on the cluster. If the files are large (truly Big Data) contact the system administrator.

There are two basic ways both require working from the command line

Transfer From a Cloud Account

A personal cloud account(e.g. Google drive or Microsoft OneDrive) can be accessed from your cluster account using the rclone command. This command will allow transfer to and from the cloud into your local cluster account. The configuration and use is done in text mode using a terminal window from any the DSL Lab workstations.

See the Using rclone page for more information.

Transfer From Your PC or Laptop

The easiest way to transfer files is scp (secure copy) command. If you are using Windows, you will need an “ssh/scp client.” Either of these “ssh clients” will work. They are both freely available at no cost. (MobaXterm is recommended)

Putty (provides terminal for ssh session)
MobaXterm (provides terminal for ssh sessions and allows remote X Windows session)
Windows PowerShell

These tools will allow you to transfer data files using the ``scp`` command.

NOTE: You will need to be inside the Saint Peters network to use scp (outside access requires logging into the campus VPN)

Mac and Linux systems have scp already installed.

If you have an account on the cluster, you will have a directory called /home/YOUR_USER_NAME, where YOUR_USER_NAME is your login name. To transfer files from your local system (laptop) to the cluster use the following command (from putty, a PowerShell or MobaXterm terminal window) The ``172.17.105.10`` is the local IP address of the cluster.

scp LOCAL_FILE_NAME   YOUR_USER_NAME@172.17.105.10:/home/YOUR_USER_NAME

You will be asked for your password (because it is a secure encrypted copy) Note: The Windows based Putty package uses the pscp command that works that same as scp.

Transferring files from the cluster to your local system works the same way, only the source and destination are swapped. Again your password is required.

scp  YOUR_USER_NAME@172.17.105.10:/home/YOUR_USER_NAME/LOCAL_FILE_NAME LOCAL_FILE_NAME

Using Python

03-Mar-2022

There are two versions of Python installed on the cluster.

Python 2.7.5
This version is the default version that is part of the Operating system distribution. To use this version there is no input needed. To confirm what version you are using enter:

$ python -V
Python 2.7.5

Python 3.7.4
This version is part of Anaconda distribution (Version 4.7.12). Anaconda provides a complete Python environment that is separate from the default version mentioned above. To use Anaconda enter:

use_conda

To stop using the Anaconda environment enter:

stop_conda

The following example demonstrates this process, note how the version of Python changes from the default, to the latest version in Anaconda.

$ use_conda
$ python -V
Python 3.7.4
$ stop_conda
$ python -V
Python 2.7.5

Anaconda Navigator

To use Anaconda Navigator, open a terminal window and type:

$ use_conda
$ anaconda-navigator

Contact the system administrator for options.

Run Anaconda Navigator on Your Laptop

You will need to be on the campus wireless network for this procedure to work. General internet access is not available. The following procedure is run from the command line.

1. Log into your account using on the cluster using MobaXterm (see above)

$ ssh dsl.saintpeters.edu

2. Once logged in, run the following:

$ jupyter-notebook --generate-config

3. Edit the config file (use vi or nano):

$ nano .jupyter/jupyter_notebook_config.py

4. Change this line:

 #c.NotebookApp.ip = 'localhost'

5. To the following (note remove “#”) save and exit:

 c.NotebookApp.ip = '127.0.0.1'

6. Now run from MobaXterm command line:

$ anaconda-navigator

Python Libraries

To check the libraries for a given Python version/environment, use the following from either the default or Anaconda environment:

pip list

If you need a library installed contact deadline@basement-supercomputing.com For a faster response, please use DSL in the subject line.

Using Spark

There are two versions of Spark on the cluster. Each version includes PySpark and SparkR. When Spark programs are run, they are run using the cluster (not just on the machine you are using)

NOTE 1: If you are logged into limulus7-TF, limulus8-TF, limulus9-TF, you will need to use one of the other machines for Spark. To do this from the *-TF machines, simply use ssh and login to limulus2-6. For example, if user “basement” is logged into limulus7-TF they can use limulus3 (or 2,4,5,6) for Spark. All user files are available on all machines.

[basement@limulus7-TF ~]$ ssh limulus3
ssh limulus3
Last login: Fri Sep 28 15:14:31 2018 from n23
[basement@limulus3 ~]$

NOTE2: Spark can be rather chatty with INFO messages (particularly V1). To turn down the messages enter the following at the Spark prompt:

sc.setLogLevel("WARN")

Spark 2.1.0
This version of Spark is the default. If you enter, spark-submit, spark-shell, pyspark, of sparkR this version will be used. Pyspark uses Python version 3.6.3

Spark 1.6.3
To use version 1 of Spark, enter the following at the command line prompt
```
$ export SPARK_MAJOR_VERSION=1
```
After this command is entered, if you enter, spark-submit, spark-shell, pyspark, of sparkR Spark version 1 will be used.

Using the Zeppelin Web Notebook

28-Jan-2022

The Zeppelin Web Notebook is a great way to create, copy, and share data science projects. The DSL Zeppelin Notebook is available by clicking on https://dsl.saintpeters.edu:8443. The Zeppelin Web Notebook has the following features:

Available from the web (you must have a valid account on the DSL cluster)
A Python 3.7.4 Interpreter
A Spark 2.3.2 Interpreter
A PySpark 2.3.2/3.74 interpreter
A Shell (bash) interpreter
A Hadoop Hive (SQL database) interpreter

The best way to learn about Zeppelin is to open a session (https://dsl.saintpeters.edu:8443) and take a look at the Notebook called Basic Tests (Python, PySpark, sh, and Hive)

Once you connect to Zeppelin you will see a web page that looks similar to the following (click to enlarge):

If you have a functional account, you can login (upper right corner) and explore the Zeppelin Notebook. Once you are logged in, your browser should look similar to the image below (the listed notebooks will be different).

Click on the Basic Tests (Python, PySpark, sh, and Hive) notebook, clone a copy of the notebook so that you can change it and you can try the paragraphs that illustrate the Zeppelin features.

For more information on using Zeppelin, see this page for Zeppelin Documentation

Please note: Users can control read and write access to their notebooks. Notebooks can be imported and exported and have multiple users.

R Studio

21-Jan-2022

To start R Studio on any of the lab machines enter the following into you browser

http://localhost:8787

Note: RStudio is only available locally on each of the machines in the lab. There is no internet access to RStudio.

R Libraries

Because the number and variety of R libraries is quite vast, each user can install and manage the libraries they need by using the simple method described below.

Libraries will be installed in /home/YOUR_USER_NAME/R-libs

Login to the cluster, cut and paste this command into terminal window:
```
   echo "R_LIBS_USER=${HOME}/R-libs" > ~/.Renviron
```
When installing libraries, say “y” to the install a personal library. The path should be
```
   /home/YOUR_USER_NAME/R-libs
```

In some case the newer development environment may be needed for libraries to build. Enter this command before starting R

scl enable devtoolset-6 bash

Use TensorFlow

Tensorflow is available as part of Anaconda Python. The easiest (and recommended) way to use Tensorflow with Python through the high level Keras library.

It is also possible to use low level Tensorflow directly.

Note: Python Keras and Tensorflow can be used on any workstation, Only three workstations have GPUs (limulus7-tf, limulus8-tf, limulus9-tf), however both Keras and Tensorflow will run without using a GPU (applications will run slower).

Testing: You can test tensorflow using the following: (on limulus7-tf, limulus8-tf, limulus9-tf)

$ use_conda
$ python   /opt/cluster/public/apps/testing/test-cuda.py 

Num CPUs Available:  1
Num GPUs Available:  1

GPUs will be present on limulus7-tf, limulus8-tf, limulus9-tf, all otheer nodes will show just one CPU.

GPU Monitoring: You can use nvidia-smi to monitor the GPU. You must be logged into the GPU system.

$nvidia-smi 
Thu Oct  9 12:12:01 2025       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 28%   29C    P8     5W / 180W |      0MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Data Science Lab

Table of Contents