CS 5043: HW4: Convolutional Neural Networks

Assignment notes:

Deadline: Friday, March 26th @11:59pm.
Hand-in procedure: submit to a pdf to Gradescope
This work is to be done on your own. While general discussion about Python, Keras and Tensorflow is encouraged, sharing solution-specific code is inappropriate. Likewise, downloading solution-specific code is not allowed.
Do not submit MSWord documents.

Data Set

The Core50 data set is a large database of videos of objects as they are being moved/rotated under a variety of different lighting and background conditions. Our general task is to classify the object being shown in a single frame of one of these videos.

Data Organization

A subset of the database is available on OSCER: /home/fagg/datasets/core50
The database is partitioned into different conditions (s1, s2, ...)
Within the condition, you will find scissors (o11 ... o15), mugs (o41 .. o45), and glasses (o26 ... o30) each contained within their own directory
Within each object directory is a sequence of PNG files. The last number of the file name is the image sequence number
Each image is 128 x 128 in size and is color (Red, Green, Blue channels)

Provided Code

We are providing the following code (posted in the git repository):

hw4_base.py: An experiment-execution module. Parameter organization, loading data, executing experiment, saving results
hw4_post_support.py: Generation of visualizations.
hw4_post.ipynb: Notebook that gives examples of using hw4_post_support
metrics.py: This code contains a function to generate a confusion matrix and a function to calculate multiclass AUC (see the experiments section for more info).

Prediction Problem

We will focus on the distinction between mugs, scissors, and glasses, for which we only have five distinct example objects (though, for each, we have many different perspectives and conditions). Our goal is to construct a model that will be generally applicable: ideally, it will be able to distinguish between any mug, any pair of scissors, and any glasses. However, given the small number of objects, this is a challenge. For the purposes of this assignment, we will use four objects from each class for training and one distinct object from each class for validation (there won't be an independent test set). For rotation 0:

Training class 1 (scissors): objects o11-o14
Training class 2 (mugs): o41-o44
Training class 3 (glasses): objects o26-o29
Validation class 1: object o15
Validation class 2: object o45
Validation class 3: object o30

Conditions for both training and validation:

       condition_list = ['s1', 's2', 's3', 's4', 's5', 's7', 's8', 's9', 's10', 's11']

We suggest using images only ending in zero (so, every 10th image)

Architectures

You will create two convolutional neural networks to distinguish the mug, scissors, and glasses: one will be a shallow network and the other will be a deep network. Each will nominally have the following structure:

One or more convolutional filters, each (possibly) followed by a max pooling layer.
- Use your favorite activation function
- In most cases, each conv/pooling layer will involve some degree of size reduction (striding)
- Convolutional filters should not be larger than 5x5 (as the size of the filter gets larger, the memory requirements explode)
Flatten
One or more dense layers
- Choose your favorite activation function
One output layer with three units (one for each class). The activation for this layer should be softmax
Loss: categorical cross-entropy
Additional metric: categorical accuracy

Since the data set is relatively small, it is important to take steps to address the over-fitting problem. Here are the key tools that you have:

Use as large a training set as possible (gives us variety), but use stochastic mini-batches to reduce the computation for a single training epoch step.
Regularization
Dropout. Only use dropout with Dense layers
Try to keep the number of trainable parameters small

Experiments

You will spend some time informally narrowing down the details of your two architectures, including the hyper-parameters (layer sizes, dropout, regularization)
Once you have made your choice of "best" architecture for each, you will perform five rotation for each model (so, a total of 10 independent runs)
For each, generate two figures:
- Learning curves (validation accuracy as a function of epoch). Put all five curves on a single plot
- Confusion matrix for each of the validation rotation predictions. There should be 5 confusion matrices for each of your models. Sample code for generating confusion matrices is provided.
Compute mean multiclass AUC across the five rotations. Multiclass AUC is calculated by calculating the average AUC of the ROC curve for each pair of classes in multiclass classification. In our case, there will be three pairs used to calculate mean AUC: scissors vs mugs, scissors vs glasses, and mugs vs glasses. We have provided code to compute multiclass AUC in the metrics.py file.

Training with Mini-Batches

Loading 3 object classes x 4 object instances x 10 conditions x 30 images makes for a fairly large training set. As we have discussed, when we have such large training sets, especially when there is a lot of autocorrelation between the examples, we can get away with estimating the gradient using a small subset of the training set. To do this, we will use a python generator to produce a mini-batch of training samples for every training epoch.

There is a variety of ways to implement the generator. Here is one example that chooses a random subset of samples for every epoch:

def training_set_generator_images(ins, outs, batch_size=10,
                          input_name='input', 
                        output_name='output'):
    '''
    Generator for producing random mini-batches of image training samples.
    
    :param ins: Full set of training set inputs (examples x row x col x chan)
    :param outs: Corresponding set of sample (examples x nclasses)
    :param batch_size: Number of samples for each minibatch
    :param input_name: Name of the model layer that is used for the input of the model
    :param output_name: Name of the model layer that is used for the output of the model
    '''
    
    while True:
        # Randomly select a set of example indices
        example_indices = random.choices(range(ins.shape[0]), k=batch_size)
        
        # The generator will produce a pair of return values: one for inputs and one for outputs
        yield({input_name: ins[example_indices,:,:,:]},
             {output_name: outs[example_indices,:]})

Then, model fitting looks like this:

    # Training generator (only used for training data!)
    generator = training_set_generator_images(ins, outs, batch_size=args.batch)
    
    # Learn
    history = model.fit(x = generator,
                        epochs=args.epochs,
                        steps_per_epoch=2,
                        verbose=args.verbose>=2,
                        validation_data=(ins_validation, outs_validation), 
                        callbacks=[early_stopping_cb])

Notes:

steps_per_epoch: number of gradient descent steps to take for each epoch. A new minibatch is produced for each epoch (this is done in parallel in a separate thread).

Hints / Notes

Start small: get the pipeline working first on a small, feasible problem (e.g., distinguish two different objects from a couple of conditions; validate on different images of the same objects/conditions).
For debugging purposes, it can be useful to examine the state of the early processing layers. We provided code for this in the CNN example in class.
We use a general function for creating networks that takes as input a set of parameters that define the configuration of the convolutional layers and dense layers. By changing these parameters, we can even change the number of layers. This makes it much easier to try a variety of things without having to re-implement or copy a lot of code.
We have done some examples with argParse. We suggest that you make use of this facility. We have a set up where you can specify all of the key details of the architecture at the command line.
Remember to check your model summary to make sure that it matches your expectations
If your model only requires 100-200 epochs of training, then you might be over-fitting
For the deeper models, expect to spend many epochs on the flat region of the error surface. We have found that a high patience is necessary
Before executing on the supercomputer, look carefully at your memory usage (our big model requires almost 10GB of memory)

What to Hand In

Hand in your notebook containing all of your code + the PDF export of the code. The PDF file must include:

Code for generating and training the network. Some useful unix command line programs:
- enscript: translate code (e.g., py files) into postscript files
- ps2pdf: translate postscript files into pdf files
- pdfunite: merge several pdf files together
Figures described above
A report of the mean multiclass AUC for each of the two architectures
Your batch files used to run the experiments for your shallow and deep networks.

Grades

50 pts: Model generation code. Is it correct? clean? documented?
25 pts: Shallow model figures and performance
25 pts: Deep model figures and performance. Note that for this to count fully you must have a mean multiclass AUC of at least 0.80 for the validation data set for at least one model.
10 pts: You solved the bonus problem of achieving average of 0.85 on the validation data set with one of your models across the five rotations.

andrewhfagg -- gmail.com

Last modified: Sat Mar 13 02:50:39 2021