AWS Configuration

This semester, we will be using an Amazon Web Service (AWS)-based compute cluster for our homework and project work. This will give us a uniform environment in which to work. Although we will start with a single computer, once we start our more substantial work, the size and capabilities of the cluster will be increased.

Compute Cluster Software

The compute cluster coniststs of a set of virtual Linux machines. We will use ssh to connect to a command-line shell, through which you will do some of your work. See this Linux command-line cheat sheet for typical commands. Common tools available on this compute cluster include: Other tools can be installed, as needed.

Compute Cluster Machines

Once up and running, the compute cluster machines will be, where X is a number (0, 1, ...). All of the machines share the same accounts and user file system (all user accounts are located in /home2). In general, user accounts are configured so that only the user and the instructor have access to the account files. Common materials are placed in the /home2/ubuntu directory and is readable by all. In particular, the data sets that we will be using for homework assignments are located in /home2/ubuntu/datasets.

Master Node. The master node of the cluster is mlfds0. This node will be used for testing new configurations before being rolled out to the the other cluster machines.

Status: mlfds0 is up

Status: mlfds1 is up

SSH Access

We are using key-based authentication to the compute cluster. This means that access will be linked to specific machines and accounts that you will be accessing the cluster from. Also, you will not use a passord for access (unless your local private key is encrypted).

SSH Installation

Create SSH Keys

If you already have a .ssh/ in your home (user) directory, then you are all set and can skip the generation step.

Cluster Access

Once your account has been confirmed, you may open up an ssh connection to one of the cluster nodes (one of the host names above).

This gives you terminal access to the node, which allows you to list/view/edit files within your home directory (and list/view files in some other directories).


Jupyter is an interactive environment for writing and executing python (and Julia and R) code. Here is a few different references:

Jupyter in the Cluster

One of the benefits of Jupyter is that the user interface executes in your browser. This means that the Jupyter server can be executing on one of our cluster machines and the interface itself can be on your laptop or desktop. Here is the procedure:

Initial configuration (do this once!):

Every time you wish to use Jupyter:

Starting a New Jupyter Notebook

A notebook represents a single interactive session, an experiment or an entire processing pipeline. When you first open the Jupyter URL, you will be presented with a file browser (the default directory is the directory in which the server was started).

Last modified: Tue Feb 13 12:07:25 2018