AWS Configuration
This semester, we will be using an Amazon Web Service (AWS)-based
compute cluster for our homework and project work. This will give us
a uniform environment in which to work. Although we will start with a
single computer, once we start our more substantial work, the size and
capabilities of the cluster will be increased.
Compute Cluster Software
The compute cluster coniststs of a set of virtual Linux machines. We
will use ssh to
connect to a command-line shell, through which you will do some of your
work. See this Linux
command-line cheat sheet for typical commands. Common tools
available on this compute cluster include:
- python (version 2 and 3)
- Editors:
- Development environments:
Other tools can be installed, as needed.
Compute Cluster Machines
Once up and running, the compute cluster machines will be
mlfdsX.cs.ou.edu, where X is a number (0, 1, ...). All of the
machines share the same accounts and user file system (all user
accounts are located in /home2). In general, user accounts are
configured so that only the user and the instructor have access to the
account files. Common materials are placed in the /home2/ubuntu
directory and is readable by all. In particular, the data sets that
we will be using for homework assignments are located in
/home2/ubuntu/datasets.
Master Node.
The master node of the cluster is mlfds0. This node will be used for
testing new configurations before being rolled out to the the other cluster
machines.
Status: mlfds0 is up
- Hostname: mlfds0.cs.ou.edu
- Varies between 2 and 4 processors, depending on the day.
- Varies between 2 and 16 GB of memory.
- 32GB of swap.
Status: mlfds1 is up
- Hostname: mlfds0.cs.ou.edu
- Varies between 2 and 4 processors, depending on the day.
- Varies between 2 and 16 GB of memory.
- 32GB of swap.
SSH Access
We are using key-based authentication to
the compute cluster. This means that access will be linked to
specific machines and accounts that you will be accessing the cluster
from. Also, you will not use a passord for access (unless your local
private key is encrypted).
SSH Installation
- Linux: installed by default
- OSX: installed by default
- Windows: there is a number of ssh clients out there. One is
Putty (I am open to others)
- Windows alternative: install the Windows subsystem for Linux. After this, then follow the Linux instructions.
Create SSH Keys
If you already have a .ssh/id_rsa.pub in your home (user)
directory, then you are all set and can skip the generation step.
- Generate a public/private key pair on your local machine:
- Unix: use ssh-keygen.
It is okay to use an empty passphrase, but doing so
means that your private key is unencrypted (this is
often okay, since it is stored on your local machine only).
- Windows: use Putty Keygen
- Email your public key (.ssh/id_rsa.pub) to the
instructor. Do not email the private key or you will
compromise your key pair.
- After you have access to the cluster, you can add other
machines to the cluster access list. This is done by appending
the contents of other id_rsa.pub files to the
.ssh/authorized_keys file on your cluster account.
Cluster Access
Once your account has been confirmed, you may open up an ssh
connection to one of the cluster nodes (one of the host names above).
- Unix: ssh USERNAME@HOSTNAME
- Putty: Create a profile for HOSTNAME, specifying your USERNAME (use
the default ssh port number for this)
Open the connection
This gives you terminal access to the node, which allows you to
list/view/edit files within your home directory (and list/view files
in some other directories).
Jupyter
Jupyter is an interactive environment for writing and executing python
(and Julia and R) code. Here is a few different references:
Jupyter in the Cluster
One of the benefits of Jupyter is that the user interface executes in
your browser. This means that the Jupyter server can be executing on
one of our
cluster machines and the interface itself can be on your laptop or
desktop. Here is the procedure:
Initial configuration (do this once!):
- Execute the following lines in the shell:
source activate python3
jupyter notebook --generate-config
- Edit .jupyter/jupyter_notebook_config.py (use gedit,
emacs or vi)
- Find the line containing c.NotebookApp.port
- Remove the comment symbol at the front of the line (the
#) and set the port to 90XX, where XX are two digits that
have been assigned to you.
- Save the file and exit the editor
Every time you wish to use Jupyter:
- Unix: on your local machine, execute:
ssh -L 90XX:127.0.0.1:90XX USERNAME@HOSTNAME
where XX is your assigned digits, UID is your cluster user name
and HOSTNAME is the name of the cluster machine that you will
be executing your jupyter server on. This sets up an encrypted
tunnel from your local machine to the port that the Jupyter
server is listening on.
- Putty: on your local machine: add a tunnel to your profile
Port: PORTNUMBER
Host: 127.0.0.1:PORTNUMBER
Don't forget to click the "Add" button.
Open the connection to the cluster machine.
- On the cluster machine: execute the following lines in the shell:
source activate python3
jupyter notebook
- This will start the Jupyter server and
result in a URL being printed in your shell
- Point your browser to this URL
- When you are done using Jupyter, make sure that everything is
saved. Then, stop the Jupyter server (^C in the shell in which
the server was started, and answer "y").
Starting a New Jupyter Notebook
A notebook represents a single interactive session, an experiment or
an entire processing pipeline. When you first open the Jupyter URL,
you will be presented with a file browser (the default directory is
the directory in which the server was started).
andrewhfagg@gmail.com
Last modified: Tue Feb 13 12:07:25 2018