CS 5043: HW6: Convolutional Neural Networks

Assignment notes:

Deadline: Tuesday, April 10th @11:59pm.
Hand-in procedure: submit to the HW6 drop box on Canvas.
This work is to be done on your own. While general discussion about Python and Tensorflow is encouraged, sharing solution-specific code is inappropriate.
Do not submit zip or MSWord documents.

Data Set

The Core50 data set is a large database of videos of objects as they are being moved/rotated under a variety of different lighting and background conditions. Our task is to classify the object being shown in a single frame of one of these videos.

Data Organization

A small subset of the database is available in /home2/fagg/datasets/core50
The database is partitioned into different conditions (we are using conditions s1 and s2 for our purposes)
Within the condition, you will find cans (o21 ... o25) and mugs (o41 .. o45), each contained within their own directory
Within each object directory is a sequence of PNG files. The last number of the file name is the image sequence number
Each image is 128 x 128 in size and is color (Red, Green, Blue channels)

Data Handling

The following functions will read in a set of PNG files:

def readPngFile(filename):
    '''
    Read a single PNG file
    
    filename = fully qualified file name
    
    Return: 3D numpy array (rows x cols x chans)
    
    Note: all pixel values are floats in the range 0.0 .. 1.0
    
    This implementation relies on the pypng package
    '''
    #print("reading:", filename)
    # Load in the image meta-data
    r = png.Reader(filename)
    it = r.read()
    
    # Load in the image itself and convert to a 2D array
    image_2d = np.vstack(map(np.uint8, it[2]))
    
    # Reshape into rows x cols x chans
    image_3d = np.reshape(image_2d,
                         (it[0],it[1],it[3]['planes'])) / 255.0
    return image_3d

def read_images_from_directory(directory, file_regexp):
    '''
    Read a set of images from a directory.  All of the images must be the 
    
    directory = Directory to search
    
    file_regexp = a regular expression to match the file names against
    
    Return: 4D numpy array (images x rows x cols x chans)
    '''
    
    # Get all of the file names
    files = sorted(os.listdir(directory))
    
    # Construct a list of images from those 
    list_of_images = [readPngFile(directory + "/" + f) for f in files if re.search(file_regexp, f) ]
    
    # Create a 3D numpy array
    return np.array(list_of_images, dtype=np.float32)

def read_image_set_from_directories(directory, spec):
    '''
    Read a set of images from a set of directories
    
    directory  = base directory to read from
    
    spec = n x 2 array of subdirs and file regexps
    
    Return: 4D numpy array (images x rows x cols x chans)
    
    '''
    out = read_images_from_directory(directory + "/" + spec[0][0], spec[0][1])
    for sp in spec[1:]:
        out = np.append(out, read_images_from_directory(directory + "/" + sp[0], sp[1]), axis=0)
    return out

Here is an example of using the top-level function to create a data set in which th cans are positive examples of the class and mugs are negative examples:

directory2 = '/home2/fagg/datasets/core50/core50_128x128/s1'
ins_pos = read_image_set_from_directories(directory2, [['o21', '.*00.png'], ['o22', '.*00.png']])
ins_neg = read_image_set_from_directories(directory2, [['o41', '.*00.png'], ['o42', '.*00.png']])
outs_pos = np.ones(ins_pos.shape[0])
outs_neg = np.zeros(ins_neg.shape[0])

ins = np.append(ins_pos, ins_neg, axis=0)
outs = np.append(outs_pos, outs_neg, axis=0)

Note that this is a tiny example: we are only loading in one out of every 100 images. In practice, you should be able to load and process all of these files.

Part 1: Data / Tensor Flow Exploration

Create a simple tensorflow pipeline that uses average pooling to reduce the size of an input image by a factor of 2 and by a factor of 4. Show the original image and the two reduced images.
Create a pipeline that first converts the original image to grayscale (tf.image.rgb_to_grayscale will do this for you) and then convolve two different 3x3 filters over the grayscale image. The two filters are one-pixel width "bar" detectors that are most sensitive to edges at 45 and -45 degrees (the book gives you examples of bar detectors at 0 and 90 degrees). Show the original image, the grayscale image and the two filtered images.

Part 2: Classification

Create:

A convolutional neural network for classifying cans vs mugs. You will have to play with the architecture to see what works best. Here is my starting point:
- Reduce image down by a factor of 2 on each side
- 5x5 convolutional filter
- Reduce down by a factor of 2 on each side
- 3x3 convolutional filter
- Two dense hidden layers. I am using my layer generation function from HW5 to do this
- Single-output sigmoidal layer
You will find it important to implement some form of regularization. I am going to try neuron dropping.
A training / validation function. I suggest following the pattern that I used in HW5: set the function up so that it can be called multiple time, with each subsequent call building on the learning results of the prior call.
Compute and report validation PSS or AUC as a function of training epoch. At the very least, validation data should come from a different condition from the training data. In the best case, the validation data comes from a different object (e.g., o41 vs o45).
For at least two different network architectures and at least two different choices for network parameters (e.g., filter size or hidden unit number), show validation PSS or AUC as a function of training epoch. These results are best shown on one figure.

Hints / Notes

matplotlib.imshow is useful:

plt.imshow(ins_neg[2,:,:,:].astype(np.float32))

It is feasible to be training with all of the images for one (or more) conditions.
For debugging purposes, I am finding it useful to examine the state of the early processing layers. To do this, include the layer in your sess.run() call, pull out a single sample and channel and use imshow(). Remember that you are dealing with grayscale images.

andrewhfagg -- gmail.com

Last modified: Thu Apr 5 21:37:18 2018