CS 5043 HW4: Complex Convolutional Neural Networks

Objectives

Implement a branching structure for a CNN
Use ImageDataGenerator to improve learning performance

Assignment Notes

Deadline: Tuesday, March 29th @11:59pm.
Hand-in procedure: submit to a zip file to the HW4 dropbox on Gradescope (details below)
This work is to be done on your own. While general discussion about Python, Keras and Tensorflow is encouraged, sharing solution-specific code is inappropriate. Likewise, downloading solution-specific code is not allowed.

Data Set

We are using the same Core50 data set as in HW 3.

Provided Code

No additional code will be provided.

Prediction Problem

The prediction problem is the same as in HW 3.

Architectures

You will create two convolutional neural networks to distinguish the mug, scissors, and glasses: one will be a relatively shallow network and the other will be a deep network. A couple possible network architectures include:

Multiple, parallel CNN networks that then merge at a Concatenate layer, followed by multiple Dense layers.
A sequence of Inception-type modules, followed by multiple Dense layers.
Some combination of the two

Additional details:

The network will have one output layer with three units (one for each class). The activation for this layer should be softmax
Loss: categorical cross-entropy
Additional metric: categorical accuracy

Experiments

If you have not done so already, set up the ImageDataGenerator. I strongly suggest that you configure command-line parameters for each of the properties you wish to vary and then create a text file that contains these parameters.
Spend a reasonable amount of time informally narrowing down the details of your two architectures, including the hyper-parameters (layer sizes, dropout, regularization). Given the nature of the datasets, I suggest that your focus on rotations 0 and 1 to begin with.
Choose your favorite model structure/hyper-parameters for each type (shallow and deep) based on the validation set performance.
Figure 1 and 2: Create a graph-based representation of each model type using the following (per Jay):
plot_model(model, to_file='%s_model_plot.png'%fbase, show_shapes=True, show_layer_names=True)
For each type, perform five rotations for each model (so, a total of 10 independent runs).
Figure 3 and 4: Learning curves (validation accuracy and loss as a function of epoch) for the shallow and deep models. Put all ten curves (shallow and deep) on a single plot; it should be very clear which are the shallow and deep cases (e.g., consider using a different color or line style).
Figure 5: Generate a histogram of test set accuracy (a total of 10 samples). The shallow and deep samples should have different colors (an alpha of 0.5 will give the histogram values some transparency).
For your favorite of shallow and deep models: perform another five rotations with exactly the same architecture/hyper-parameters, but with data augmentation turned off.
Figure 6 and 7: Learning curves (validation accuracy and loss as a function of epoch) for with and without data augmentation.
Figure 8: Generate a histogram of test set accuracy (a total of 10 samples). The with and without cases should have different colors (an alpha of 0.5 will give the histogram values some transparency).

Hints / Notes

Create functions to build parts of your architecture.
Get things working on your local machine before exporting to the supercomputer.
Remember to check your model summary and the graph-based representation of your model to make sure that it matches your expectations.
Watch your RAM and thread utilization. Login to the compute node and use the 'top' command to examine these.
CPUS_PER_TASK in the batch file and at the command line should match your thread utilization.
A batch size is 64 works a lot better than 32.
Data augmentation probably needs a lower learning rate or a higher regularization.

What to Hand In

A single zip file that contains:

All of your python code, including your network building code
If your visualization code is a Jupyter Notebook, then export a pdf of your notebook and include it
- File/Save and Export Notebook As/PDF
Figures 1-8
Your batch file(s)
One sample stdout file
A written reflection that answers the following questions:
- What is the outline of your network architecture?
- Which model (shallow or deep) turned out to work better? Did you have to adjust hyper-parameters between the two, other than network structure?
- What can you conclude from the validation accuracy learning curves for each of the shallow and deep networks? How confident are you that you have created models that you can trust?
- Did your shallow or deep network perform better with respect to the test set? (no need for a statistical argument here)
- Did data augmentation improve or inhibit model performance with respect to the validation and test sets?
Include this reflection as a separate file or at the end of your Jupyter notebook

Grading

15 pts: Clean code for model building (including in-code documentation)
10 pts: Figures 1 and 2: Network architecture
10 pts: Figure 3: Shallow/deep loss learning curves
10 pts: Figure 4: Shallow/deep accuracy learning curves
10 pts: Figure 5: Test set histograms (shallow/deep)
10 pts: Figure 6: With/without data augmentation loss learning curves
10 pts: Figure 7: With/without data augmentation accuracy learning curves
10 pts: Figure 8: With/without data augmentation test set histograms
15 pts: Reflection

andrewhfagg -- gmail.com

Last modified: Wed Mar 30 23:56:53 2022