CS 5043 HW4: Complex Convolutional Neural Networks
Objectives
- Implement a branching structure for a CNN
- Use ImageDataGenerator to improve learning performance
Assignment Notes
- Deadline: Tuesday, March 29th @11:59pm.
- Hand-in procedure: submit to a zip file to the HW4 dropbox on
Gradescope (details below)
- This work is to be done on your own. While general discussion
about Python, Keras and Tensorflow is encouraged, sharing
solution-specific code is inappropriate. Likewise, downloading
solution-specific code is not allowed.
Data Set
We are using the same Core50 data set as in
HW 3.
Provided Code
No additional code will be provided.
Prediction Problem
The prediction problem is the same as in HW 3.
Architectures
You will create two convolutional neural networks to distinguish the
mug, scissors,
and glasses: one will be a relatively shallow network and the other
will be a deep network. A couple possible network architectures
include:
- Multiple, parallel CNN networks that then merge at a
Concatenate layer, followed by multiple Dense layers.
- A sequence of Inception-type modules, followed by multiple Dense
layers.
- Some combination of the two
Additional details:
- The network will have one output layer with three units (one
for each class). The
activation for this layer should be softmax
- Loss: categorical cross-entropy
- Additional metric: categorical accuracy
Experiments
- If you have not done so already, set up the
ImageDataGenerator. I strongly suggest that you configure
command-line parameters for each of the properties you wish to
vary and then create a text file that contains these
parameters.
- Spend a reasonable amount of time informally narrowing down the
details of your two architectures, including the
hyper-parameters (layer sizes, dropout, regularization). Given
the nature of the datasets, I suggest that your focus on
rotations 0 and 1 to begin with.
- Choose your favorite model structure/hyper-parameters for each
type (shallow and deep) based on the validation set
performance.
- Figure 1 and 2: Create a graph-based representation of
each model type using the following (per Jay):
plot_model(model, to_file='%s_model_plot.png'%fbase, show_shapes=True, show_layer_names=True)
- For each type, perform five rotations for each model (so, a total of
10 independent runs).
- Figure 3 and 4: Learning curves (validation accuracy and
loss as a function of epoch) for the shallow and deep models.
Put all ten curves (shallow and deep) on a single plot; it
should be very clear which are the shallow and deep cases
(e.g., consider using a different color or line style).
- Figure 5: Generate a histogram of test set accuracy (a
total of 10 samples). The shallow and deep samples should have
different colors (an alpha of 0.5 will give the histogram
values some transparency).
- For your favorite of shallow and deep models: perform another
five rotations with exactly the same
architecture/hyper-parameters, but with data augmentation turned
off.
- Figure 6 and 7: Learning curves (validation accuracy and
loss as a function of epoch) for with and without data
augmentation.
- Figure 8: Generate a histogram of test set accuracy (a
total of 10 samples). The with and without cases should have
different colors (an alpha of 0.5 will give the histogram
values some transparency).
Hints / Notes
- Create functions to build parts of your architecture.
- Get things working on your local machine before exporting to
the supercomputer.
- Remember to check your model summary and the graph-based
representation of your model to make sure that it
matches your expectations.
- Watch your RAM and thread utilization. Login to the compute
node and use the 'top' command to examine these.
- CPUS_PER_TASK in the batch file and at the command line should
match your thread utilization.
- A batch size is 64 works a lot better than 32.
- Data augmentation probably needs a lower learning rate or a
higher regularization.
What to Hand In
A single zip file that contains:
- All of your python code, including your network building code
- If your visualization code is a Jupyter Notebook, then export
a pdf of your notebook and include it
- File/Save and Export Notebook As/PDF
- Figures 1-8
- Your batch file(s)
- One sample stdout file
- A written reflection that answers the following questions:
- What is the outline of your network architecture?
- Which model (shallow or deep) turned out to work better?
Did you have to adjust hyper-parameters between the two,
other than network structure?
- What can you conclude from the validation accuracy learning
curves for each of the shallow and deep networks? How
confident are you that you have created models that you
can trust?
- Did your shallow or deep network perform better with
respect to the test set? (no need for a statistical
argument here)
- Did data augmentation improve or inhibit model
performance with respect to the validation and test
sets?
Include this reflection as a separate file or at the end of
your Jupyter notebook
Grading
- 15 pts: Clean code for model building (including in-code documentation)
- 10 pts: Figures 1 and 2: Network architecture
- 10 pts: Figure 3: Shallow/deep loss learning curves
- 10 pts: Figure 4: Shallow/deep accuracy learning curves
- 10 pts: Figure 5: Test set histograms (shallow/deep)
- 10 pts: Figure 6: With/without data augmentation loss learning curves
- 10 pts: Figure 7: With/without data augmentation accuracy learning curves
- 10 pts: Figure 8: With/without data augmentation test set histograms
- 15 pts: Reflection
andrewhfagg -- gmail.com
Last modified: Wed Mar 30 23:56:53 2022