CS 5043: HW4: Convolutional Neural Networks

Assignment notes:

Deadline: Saturday, February 29th @11:59pm.
Hand-in procedure: submit to the HW4 drop box on Canvas (zip or tar file with code) and Gradescope (pdf).
This work is to be done on your own. While general discussion about Python, Keras and Tensorflow is encouraged, sharing solution-specific code is inappropriate. Likewise, downloading solution-specific code is not allowed.
Do not submit MSWord documents.

Data Set

The Core50 data set is a large database of videos of objects as they are being moved/rotated under a variety of different lighting and background conditions. Our general task is to classify the object being shown in a single frame of one of these videos.

Data Organization

A subset of the database is available on OSCER: /home/fagg/ml_datasets/core50
The database is partitioned into different conditions (s1, s2, ...)
Within the condition, you will find scissors (o11 ... o15) and mugs (o41 .. o45), each contained within their own directory
Within each object directory is a sequence of PNG files. The last number of the file name is the image sequence number
Each image is 128 x 128 in size and is color (Red, Green, Blue channels)
For class, I provided CNN skeleton code that contains the procedure for loading sets of images and constructing a data set for training/validation.

Prediction Problem

We will focus on the distinction between mugs and scissors, for which we only have five distinct example objects (though, for each, we have many different perspectives and conditions). Our goal is to construct a model that will be generally applicable: ideally, it will be able to distinguish any mug from any pair of scissors. However, given the small number of objects, this is a challenge. For the purposes of this assignment, we will use four objects of each class for training and one distinct object of each class for validation (there won't be an independent test set). Specifically:

Training positive examples: o12, o13, o14 and o15
Training negative examples: o42, o43, o44 and o45
Validation positive examples: o11
Validation negative examples: o41

Conditions for both training and validation:

       condition_list = ['s1', 's2', 's3', 's4', 's5', 's7', 's8', 's9', 's10', 's11']

Use images only ending in zero (so, every 10th image)

Architectures

You will create two convolutional neural networks to distinguish the mugs and scissors: one be a shallow network and the other will be a deep network. Each will nominally have the following structure:

One or more convolutional filters, each (possibly) followed by a max pooling layer.
- Use your favorite activation function
- In most cases, each conv/pooling layer will involve some degree of size reduction (striding)
- Convolutional filters should not be larger than 5x5 (as the size of the filter gets larger, the memory requirements explode)
Flatten
One or more dense layers
- Choose your favorite activation function
One output layer with two units (one for each class). The activation for this layer should be softmax

Since we are at the limits of data, it is important to take steps to address the over-fitting problem. Here are the key tools that you have:

Use as large a training set as possible (gives us variety), but use stochastic mini-batches to reduce the computation for a single training epoch step.
Regularization
Dropout. Only use dropout with Dense layers
Try to keep the number of trainable parameters small

Experiments

You will spend some time informally narrowing down the details of your two architectures, including the hyper-parameters (layer sizes, dropout, regularization)
Once you have made your choice of "best" architecture for each, you will perform five runs for each
For each, generate two figures:
- Learning curves (validation accuracy as a function of epoch). Put all five curves on a single plot
- ROC curves (validation true positive rate as a function of false positive rate). Example code for this was included in the CNN example. Again, there will be one curve for each of the five runs
Compute mean validation accuracy for each of the two models

Training with Mini-Batches

Loading 8 objects x 10 conditions x 30 images makes for a fairly large training set. As we have discussed, when we have such large training sets, especially when there is a lot of autocorrelation between the examples, we can get away with estimating the gradient using a small subset of the training set. To do this, we will use a python generator to produce a mini-batch of training samples for every training epoch.

There is a variety of ways to implement the generator. Here is one example that chooses a random subset of samples for every epoch:

def training_set_generator_images(ins, outs, batch_size=10,
                          input_name='input', 
                        output_name='output'):
    '''
    Generator for producing random mini-batches of image training samples.
    
    @param ins Full set of training set inputs (examples x row x col x chan)
    @param outs Corresponding set of sample (examples x nclasses)
    @param batch_size Number of samples for each minibatch
    @param input_name Name of the model layer that is used for the input of the model
    @param output_name Name of the model layer that is used for the output of the model
    '''
    
    while True:
        # Randomly select a set of example indices
        example_indices = random.choices(range(ins.shape[0]), k=batch_size)
        
        # The generator will produce a pair of return values: one for inputs and one for outputs
        yield({input_name: ins[example_indices,:,:,:]},
             {output_name: outs[example_indices,:]})

Then, model fitting looks like this:

    # Training generator
    generator = training_set_generator_images(ins, outs, batch_size=args.batch)
    
    # Learn
    history = model.fit_generator(generator,
                                  epochs=args.epochs,
                                  steps_per_epoch=2,
                                  use_multiprocessing=True, 
                                  verbose=args.verbose>=2,
                                  validation_data=(ins_validation, outs_validation), 
                                  callbacks=[early_stopping_cb])

Notes:

steps_per_epoch: number of gradient descent steps to take for each epoch. A new minibatch is produced for each epoch (this is done in parallel in a separate thread).
use_multiprocessing: if true, fit uses as many cores as are available to do the fitting. Nicely, this scales as you move your jobs between machines

Hints / Notes

Start small: get the pipeline working first on a small, feasible problem (e.g., distinguish two different objects from a couple of conditions; validate on different images of the same objects/conditions).

matplotlib.imshow is useful:

plt.imshow(ins_neg[2,:,:,:].astype(np.float32))

For debugging purposes, I find it useful to examine the state of the early processing layers. I provided code for this in the CNN example in class
I have a general function for creating networks that takes as input a set of parameters that define the configuration of the convolutional layers and dense layers. By changing these parameters, I can even change the number of layers. This makes it much easier to try a variety of things without having to re-implement or copy a lot of code.
We have done some examples with argParse. I suggest that you make use of this facility. I have a set up where I can specify all of the key details of my architecture at the command line
Remember to check your model summary to make sure that it matches your expectations
If your model only requires 100-200 epochs of training, then you might be over-fitting
Before executing on the supercomputer, look carefully at your memory usage (my big model requires almost 10GB of memory)

What to Hand In

Hand in your notebook containing all of your code + the PDF export of the code. The PDF file must include:

Code for generating and training the network. Some useful unix command line programs:
- enscript: translate code (e.g., py files) into postscript files
- ps2pdf: translate postscript files into pdf files
- pdfunite: merge several pdf files together
Four figures described above
A report of the mean accuracy for each of the two architectures

Grades

50 pts: Model generation code. Is it correct? clean? documented?
25 pts: Shallow model figures and performance
25 pts: Deep model figures and performance. Note that for this to count fully you must have an accuracy of at least 0.7 for the validation data set
10 pts: You solved the bonus problem of achieving 0.8 on the validation data set with one of your models

andrewhfagg -- gmail.com

Last modified: Mon Mar 9 01:43:25 2020