CS 5043: HW0

Executing DL Experiments on the Supercomputer

Objectives:

Implement a shallow network that is capable of learning a Boolean function.
Implement experiment control code that executes a single instance of a learning run and stores the results in a pickle file.
Use the supercomputer to execute a set of experiments.
Implement a tool that brings the results together from the different experiments so they can be represented in a common set of figures.

Assignment notes:

Deadline: Saturday, February 12th @11:59pm. Solutions may be submitted up to 48 hours late, with a 10% penalty for each 24 hours. Note that HW1 will be assigned and due on the original schedule.
This work is to be done on your own. While general discussion about Python, TensorFlow and Keras is okay, sharing solution-specific code is inappropriate. Likewise, you may not download code solutions to this problem from the network.

Data Set

All data sets for the class are available on the supercomputer in the directory /home/fagg/aml_2022_datasets/. You do not need to keep copies of these datasets in your own home directory on the supercomputer.

For this homework assignment, we will be using hw0_dataset.pkl. This file contains one python object, a dictionary, that has two keys: ins and outs. The following code snippet will load the data into your python environment:

fp = open("hw0_dataset.pkl", "rb")
foo = pickle.load(fp)
fp.close()

Notes:

We have assumed that this file is in the same directory as your notebook/python code.
There is only a training set (and no validation or test sets). So, you will need to "fake" a validation set to use EarlyStopping.
Take some time to examine the data.

Part 1: Network

Write a function that constructs a relatively shallow neural network that can regenerate the output from the corresponding inputs.
The first thing that you try will likely not work. You will need to think about the appropriate non-linearities to use and to play with the number of layers and neurons.
Once you have settled on an architecture that works well, then proceed to the next part.

Part 2: Multiple Runs

Use the supercomputer to perform 10 independent learning runs.
Plot the learning curves (MSE as a function of epoch) for all 10 runs on the same figure.
Compute the absolute prediction errors for all runs, combine these data and generate a single histogram of the absolute errors.

Hints

Once a model is trained, you can ask it to predict output values for a set of examples using the model.predict() function
Numpy provides an absolute value operator: np.abs()
Matplotlib provides an easy way to generate histograms:
plt.hist(errors, 50)
This generates a histogram with 50 bins
A figure can be written to a file by Matplotlib:
plt.savefig('foo.png')
will generate a PNG file of the specified name.

Expectations

Terminal MSE for the individual runs should be very low. If this is not the case, then go back to your network design.
It is very hard to learn a network that generates the correct output for every example. Getting a couple examples wrong in individual runs is okay.

What to Hand-In

Submit a zip file to the HW0 section of Gradescope (enter via Canvas) that contains:

your python code,
your batch file,
the learning curve figure,
the histogram, and
the stdout files from each of the 10 experiments.

Grading

30 pts: low MSE for every run
20 pts: few prediction errors greater than 0.4
20 pts: well structured code
10 pts: appropriate documentation
10 pts: figures are well formatted and have appropriate axis labels
10 pts: 10 stdout files
10 pts (bonus): all prediction errors less than 0.4

andrewhfagg -- gmail.com

Last modified: Fri Feb 4 00:46:29 2022