CS 5043: HW4: Convolutional Neural Networks
Assignment notes:
- Deadline: Saturday, February 29th @11:59pm.
- Hand-in procedure: submit to the HW4 drop box on Canvas (zip or
tar file with code) and Gradescope (pdf).
- This work is to be done on your own. While general discussion
about Python, Keras and Tensorflow is encouraged, sharing
solution-specific code is inappropriate. Likewise, downloading
solution-specific code is not allowed.
- Do not submit MSWord documents.
Data Set
The Core50 data set
is a large database of videos of objects as they are being
moved/rotated under a variety of different lighting and background
conditions. Our general task is to classify the object being shown in a
single frame of one of these videos.
Data Organization
- A subset of the database is available on OSCER:
/home/fagg/ml_datasets/core50
- The database is partitioned into different conditions (s1, s2,
...)
- Within the condition, you will find scissors (o11 ... o15) and mugs
(o41 .. o45), each contained within their own directory
- Within each object directory is a sequence of PNG files. The
last number of the file name is the image sequence number
- Each image is 128 x 128 in size and is color (Red, Green, Blue channels)
- For class, I provided CNN skeleton code
that contains the procedure for loading sets of images and
constructing a data set for training/validation.
Prediction Problem
We will focus on the distinction between mugs and scissors, for which we
only have five distinct example objects (though, for each, we have many
different perspectives and conditions). Our goal is to construct a
model that will be generally applicable: ideally, it will be able to
distinguish any mug from any pair of scissors. However,
given the small number of objects, this is a challenge. For the
purposes of this assignment, we will use four objects of each class
for training and one distinct object of each class for validation
(there won't be an independent test set). Specifically:
- Training positive examples: o12, o13, o14 and o15
- Training negative examples: o42, o43, o44 and o45
- Validation positive examples: o11
- Validation negative examples: o41
- Conditions for both training and validation:
condition_list = ['s1', 's2', 's3', 's4', 's5', 's7', 's8', 's9', 's10', 's11']
- Use images only ending in zero (so, every 10th image)
Architectures
You will create two convolutional neural networks to distinguish the mugs
and scissors: one be a shallow network and the other will be a deep
network. Each will nominally have the following structure:
- One or more convolutional filters, each (possibly) followed by a
max pooling layer.
- Use your favorite activation function
- In most cases, each conv/pooling layer will involve some
degree of size reduction (striding)
- Convolutional filters should not be larger than 5x5
(as the size of the filter gets larger, the memory
requirements explode)
- Flatten
- One or more dense layers
- Choose your favorite activation function
- One output layer with two units (one for each class). The
activation for this layer should be softmax
Since we are at the limits of data, it is important to take steps to
address the over-fitting problem. Here are the key tools that you have:
- Use as large a training set as possible (gives us variety),
but use stochastic mini-batches to reduce the computation for a
single training epoch step.
- Regularization
- Dropout. Only use dropout with Dense layers
- Try to keep the number of trainable parameters small
Experiments
- You will spend some time informally narrowing down the details of your two
architectures, including the hyper-parameters (layer sizes,
dropout, regularization)
- Once you have made your choice of "best" architecture for each,
you will perform five runs for each
- For each, generate two figures:
- Learning curves (validation accuracy as a function of
epoch). Put
all five curves on a single plot
- ROC curves (validation true positive rate as a function
of false positive rate). Example code for this was
included in the CNN example. Again, there will be one curve for
each of the five runs
- Compute mean validation accuracy for each of the two models
Training with Mini-Batches
Loading 8 objects x 10 conditions x 30 images makes for a fairly large
training set. As we have discussed, when we have such large training
sets, especially when there is a lot of autocorrelation between the
examples, we can get away with estimating the gradient using a small
subset of the training set. To do this, we will use a python
generator to produce a mini-batch of training samples for every
training epoch.
There is a variety of ways to implement the generator. Here is one
example that chooses a random subset of samples for every epoch:
def training_set_generator_images(ins, outs, batch_size=10,
input_name='input',
output_name='output'):
'''
Generator for producing random mini-batches of image training samples.
@param ins Full set of training set inputs (examples x row x col x chan)
@param outs Corresponding set of sample (examples x nclasses)
@param batch_size Number of samples for each minibatch
@param input_name Name of the model layer that is used for the input of the model
@param output_name Name of the model layer that is used for the output of the model
'''
while True:
# Randomly select a set of example indices
example_indices = random.choices(range(ins.shape[0]), k=batch_size)
# The generator will produce a pair of return values: one for inputs and one for outputs
yield({input_name: ins[example_indices,:,:,:]},
{output_name: outs[example_indices,:]})
Then, model fitting looks like this:
# Training generator
generator = training_set_generator_images(ins, outs, batch_size=args.batch)
# Learn
history = model.fit_generator(generator,
epochs=args.epochs,
steps_per_epoch=2,
use_multiprocessing=True,
verbose=args.verbose>=2,
validation_data=(ins_validation, outs_validation),
callbacks=[early_stopping_cb])
Notes:
- steps_per_epoch: number of gradient descent steps to
take for each epoch. A new minibatch is produced for each
epoch (this is done in parallel in a separate thread).
- use_multiprocessing: if true, fit uses as many cores as are
available to do the fitting. Nicely, this scales as you move
your jobs between machines
Hints / Notes
What to Hand In
Hand in your notebook containing all of your code + the PDF export of
the code. The PDF file must include:
- Code for generating and training the network. Some useful unix
command line programs:
- enscript: translate code (e.g., py files) into postscript files
- ps2pdf: translate postscript files into pdf files
- pdfunite: merge several pdf files together
- Four figures described above
- A report of the mean accuracy for each of the two architectures
Grades
- 50 pts: Model generation code. Is it correct? clean? documented?
- 25 pts: Shallow model figures and performance
- 25 pts: Deep model figures and performance. Note that for this
to count fully you must have an accuracy of at least 0.7 for
the validation data set
- 10 pts: You solved the bonus problem of achieving 0.8 on the
validation data set with one of your models
andrewhfagg -- gmail.com
Last modified: Mon Mar 9 01:43:25 2020