====== How Do I ? ====== This page include answers to many of the commonly asked questions about the cluster. Please send any questions you may have to ''deadline@basement-supercomputing.com'' For a faster response use DSL in the subject line. ==== Request Library or Application ==== As a valid user with a login account, you may request libraries (i.e. Python libraries) or an application to be installed on the DSL cluster. Keep in mind not all libraries/applications can be installed due to compatibility issues.
Action mail deadline@eadline.org,sjagannathan@saintpeters.edu Thanks "Thanks for your submission." Fieldset "Need Help with Library or Applications?" Textbox "Name (first/ last)" "=Your Name" Textbox "Your Login" "=Login" email "Your E-Mail Address" "=Email Address" textarea "Ask General Questions or Request Library/Application do you need?" static "**Please do not submit bugs or operational issues here**" submit "Submit Query"
==== Transfer Files To/From the Cluster ==== (14-Feb-2022) There are several ways to transfer files to and from your account on the cluster. If the files are large (truly **Big Data**) contact the system administrator. There are two basic ways ** both require working from the command line** ===Transfer From a Cloud Account=== A personal cloud account(e.g. Google drive or Microsoft OneDrive) can be accessed from your cluster account using the ''rclone'' command. This command will allow transfer to and from the cloud into your local cluster account. The configuration and use is done in text mode using a terminal window from any the DSL Lab workstations. See the [[Using rclone|Using rclone]] page for more information. ===Transfer From Your PC or Laptop==== The easiest way to transfer files is ''scp'' (secure copy) command. If you are using Windows, you will need an "ssh/scp client." Either of these "ssh clients" will work. They are both freely available at no cost. (MobaXterm is recommended) * [[http://www.putty.org|Putty]] (provides terminal for ssh session)\\ * [[http://mobaxterm.mobatek.net|MobaXterm]] (provides terminal for ssh sessions and allows remote X Windows session) * Windows PowerShell These tools will allow you to transfer data files using the ``scp`` command. NOTE: You will need to be inside the Saint Peters network to use ''scp'' (outside access requires logging into the campus VPN) Mac and Linux systems have ''scp'' already installed. If you have an account on the cluster, you will have a directory called ''/home/YOUR_USER_NAME'', where YOUR_USER_NAME is your login name. To transfer files **from your local system** (laptop) to the cluster use the following command (from putty, a PowerShell or MobaXterm terminal window) The ``172.17.105.10`` is the local IP address of the cluster. scp LOCAL_FILE_NAME YOUR_USER_NAME@172.17.105.10:/home/YOUR_USER_NAME You will be **asked for your password** (because it is a secure encrypted copy) Note: The Windows based Putty package uses the ''pscp'' command that works that same as ''scp''. Transferring files **from the cluster** to your local system works the same way, only the source and destination are swapped. Again your password is required. scp YOUR_USER_NAME@172.17.105.10:/home/YOUR_USER_NAME/LOCAL_FILE_NAME LOCAL_FILE_NAME ==== Using Python ==== 03-Mar-2022 There are two versions of Python installed on the cluster. *Python 2.7.5\\ This version is the default version that is part of the Operating system distribution. To use this version there is no input needed. To confirm what version you are using enter: $ python -V Python 2.7.5 *Python 3.7.4\\ This version is part of Anaconda distribution (Version 4.7.12). Anaconda provides a complete Python environment that is **separate** from the default version mentioned above. To use Anaconda enter: use_conda To stop using the Anaconda environment enter: stop_conda The following example demonstrates this process, note how the version of Python changes from the default, to the latest version in Anaconda. $ use_conda $ python -V Python 3.7.4 $ stop_conda $ python -V Python 2.7.5 === Anaconda Navigator === To use ''Anaconda Navigator'', open a terminal window and type: $ use_conda $ anaconda-navigator Contact the system administrator for options. === Run Anaconda Navigator on Your Laptop === You will need to be on the campus wireless network for this procedure to work. General internet access is not available. The following procedure is run from the command line. 1. Log into your account using on the cluster using [[http://mobaxterm.mobatek.net|MobaXterm]] (see above) $ ssh dsl.saintpeters.edu 2. Once logged in, run the following: $ jupyter-notebook --generate-config 3. Edit the config file (use ''vi'' or ''nano''): $ nano .jupyter/jupyter_notebook_config.py 4. Change this line: #c.NotebookApp.ip = 'localhost' 5. To the following (note remove "#") save and exit: c.NotebookApp.ip = '127.0.0.1' 6. Now run from MobaXterm command line: $ anaconda-navigator === Python Libraries === To check the libraries for a given Python version/environment, use the following from either the default or Anaconda environment: pip list If you need a library installed contact ''deadline@basement-supercomputing.com'' For a faster response, please use DSL in the subject line. ==== Using Spark ==== There are two versions of Spark on the cluster. Each version includes PySpark and SparkR. When Spark programs are run, they are run using the cluster (not just on the machine you are using) NOTE 1: If you are logged into limulus7-TF, limulus8-TF, limulus9-TF, you will need to use one of the other machines for Spark. To do this from the *-TF machines, simply use ssh and login to limulus2-6. For example, if user "basement" is logged into limulus7-TF they can use limulus3 (or 2,4,5,6) for Spark. All user files are available on all machines. [basement@limulus7-TF ~]$ ssh limulus3 ssh limulus3 Last login: Fri Sep 28 15:14:31 2018 from n23 [basement@limulus3 ~]$ NOTE2: Spark can be rather chatty with INFO messages (particularly V1). To turn down the messages enter the following at the Spark prompt: sc.setLogLevel("WARN") *Spark 2.1.0\\ This version of Spark is the default. If you enter, ''spark-submit'', ''spark-shell'', ''pyspark'', of ''sparkR'' this version will be used. Pyspark uses Python version 3.6.3 *Spark 1.6.3\\ To use version 1 of Spark, enter the following at the command line prompt $ export SPARK_MAJOR_VERSION=1 \\ After this command is entered, if you enter, ''spark-submit'', ''spark-shell'', ''pyspark'', of ''sparkR'' Spark version 1 will be used. ====Using the Zeppelin Web Notebook ==== 28-Jan-2022 The Zeppelin Web Notebook is a great way to create, copy, and share data science projects. The DSL Zeppelin Notebook is available by clicking on https://dsl.saintpeters.edu:8443. The Zeppelin Web Notebook has the following features: * Available from the web (you must have a valid account on the DSL cluster) * A **Python 3.7.4** Interpreter * A **Spark 2.3.2** Interpreter * A **PySpark 2.3.2/3.74** interpreter * A **Shell (bash)** interpreter * A **Hadoop Hive (SQL database)** interpreter The best way to learn about Zeppelin is to open a session (https://dsl.saintpeters.edu:8443) and take a look at the Notebook called //Basic Tests (Python, PySpark, sh, and Hive)// Once you connect to Zeppelin you will see a web page that looks similar to the following (click to enlarge): {{ :wiki:zeppelin-1-dsl.png?600 |}} If you have a functional account, you can login (upper right corner) and explore the Zeppelin Notebook. Once you are logged in, your browser should look similar to the image below (the listed notebooks will be different). Click on the **//Basic Tests (Python, PySpark, sh, and Hive)//** notebook, clone a copy of the notebook so that you can change it and you can try the paragraphs that illustrate the Zeppelin features. {{ :wiki:zeppelin-2-dsl.png?600 |}} For more information on using Zeppelin, see this page for [[https://zeppelin.apache.org/docs/0.8.0/| Zeppelin Documentation]] Please note: Users can control read and write access to their notebooks. Notebooks can be imported and exported and have multiple users. ==== R Studio ==== 21-Jan-2022 To start R Studio on any of the lab machines enter the following into you browser ''http://localhost:8787'' Note: RStudio is only available **locally** on each of the machines in the lab. There is no internet access to RStudio. ==== R Libraries ==== Because the number and variety of R libraries is quite vast, each user can install and manage the libraries they need by using the simple method described below. Libraries will be installed in /home/YOUR_USER_NAME/R-libs -Login to the cluster, cut and paste this command into terminal window: echo "R_LIBS_USER=${HOME}/R-libs" > ~/.Renviron -When installing libraries, say "y" to the install a personal library. The path should be /home/YOUR_USER_NAME/R-libs In some case the newer development environment may be needed for libraries to build. Enter this command before starting R scl enable devtoolset-6 bash ==== Use TensorFlow ==== Tensorflow is available as part of Anaconda Python. The easiest (and recommended) way to use Tensorflow with Python through the high level [[https://keras.io/guides/|Keras]] library. It is also possible to use [[https://pythonprogramming.net/tensorflow-introduction-machine-learning-tutorial/|low level Tensorflow]] directly. **Note:** Python Keras and Tensorflow can be used on any workstation, Only three workstations have GPUs (limulus7-tf, limulus8-tf, limulus9-tf), however both Keras and Tensorflow will run without using a GPU (applications will run slower).