CS 5043: HW7: Semantic Labeling

Assignment notes:

Deadline:
- Expected completion: Friday, April 30th @11:59pm.
- Deadline: Monday, May 3rd @11:59pm.
Hand-in procedure: submit a pdf to Gradescope
This work is to be done on your own. While general discussion about Python, Keras and Tensorflow is encouraged, sharing solution-specific code is inappropriate. Likewise, downloading solution-specific code is not allowed.
Do not submit MSWord documents.

Data Set

The Chesapeake Watershed data set is derived from satellite imagery over all of the US states that are part of the Chesapeake Bay watershed system. We are using the patches part of the data set. Each patch is a 256 x 256 image with 29 channels, in which each pixel corresponds to a 1m x 1m area of space. Some of these channels are visible light channels (RGB), while others encode surface reflectivity at different frequencies. In addition, each pixel is labeled as being one of:

1 = water
2 = tree canopy / forest
3 = low vegetation / field
4 = barren land
5 = impervious (other)
6 = impervious (road)
15 = no data [NOTE: this is remapped by our loader to class zero]

Here is an example of the RGB image of one patch and the corresponding pixel labels:

Notes:

Detailed description of the data set

Data Organization

All of the data are located on the supercomputer in: /scratch/fagg/chesapeake. Within this directory, there are both train and valid directories. Each of these contain directories F0 ... F9 (folds 0 to 9). Each training fold is composed of 5000 patches (you will need at least 40GB of RAM to load one fold; I don't recommend doing this on your home machine).

Local testing: the file /scratch/fagg/chesapeake_F0.tar.gz contains the data for training fold 0 (it is 3GB compressed).

We will use the valid data set as a proper test set. You should sample from the train directory for a proper validation data set.

Within each fold directory, the files are of the form: SOME_NON_UNIQUE_HEADER-YYY.npz. Where YYY is 0 ... 499 (all possible YYYs occur in each fold directory. There are multiple files with each YYY number in each directory (100 in the training fold directories, to be precise). Because there are 5000 patches in each training fold directory, we will generally be loading a subset of one fold of these data at any one time.

Data Access

In the git repository, we provide a loader for files in one directory:

ins, mask, outs, weights = load_files_from_dir(file_base, filt)

where:

file_base is the directory
filt is a regular expression filter that specifies which numbers to include.
- '-[1234]?' will load all 2-digit numbers starting with 1, 2, 3 or 4. For a training fold, this corresponds to 200 examples (enough to get started with).
- '-1[1234]?' will load all 3-digit numbers starting with 1 and having 1, 2, 3 or 4 as the second digit (also 200 examples).
- '-*' will load all 5000 examples.

Of the return values, ins and outs are properly-formatted tensors for training / evaluating. ins is shape (examples, rows, cols, chans) and outs is shape (examples, rows, cols). Note that outs is an integer tensor that contains the class ID of each pixel (values 0 ... 6) (it is not one-hot encoded for efficiency reasons).

The Problem

Create an image-to-image translator that does semantic labeling of the images.

Details:

Your network output should be shape (examples, rows, cols, class), where the sum of all class outputs for a single pixel is 1 (i.e., we are using a softmax here).
Use tf.keras.losses.SparseCategoricalCrossentropy as your loss function. This will properly translate between your one-output per class per pixel to the outs that have just one class label for each pixel.
Use tf.keras.metrics.SparseCategoricalAccuracy as an evaluation metric. Because of the class imbalance, a model that predicts the majority class will have an accuracy of ~0.72
Try using a sequential-style model, as well as a U-net model.

Deep Learning Experiment

For what you think is your best performing model type (and hyperparameters), perform 5 different experiments:

Use '-*[01234]' for training (train files)
Use '-*[89]' for validation (train files)
Use '-*' for testing (valid files)

The five different experiments will use folds F0 ... F4 (so, no overlap in training data sets; likewise for the validation and testing datasets).

(details are subject to change)

Performance Reporting

Figure: validation accuracy as a function of training epoch. Show 5 curves.
5 figures: for each final model, evaluate using the test data set and generate a confusion matrix for each of the models.
Metric: average test accuracy across the models

What to Hand In

Hand in a PDF file that contains:

Code for generating and training the network. Some useful UNIX command line programs:
- enscript: translate code (e.g., py files) into postscript files
- ps2pdf: translate postscript files into pdf files
- pdfunite: merge several pdf files together
The above figures
The final metrics (note that this can be with respect to the best performing epoch, as identified by EarlyStopping)

Grades

50 pts: Model generation code. Is it correct? clean? documented?
50 pts: Model figures and performance. We expect at least something interesting in the confusion matrices and an average test accuracy of 0.85.
10 pts: An average test accuracy of 0.9

References

A Beginner's Guide to Deep Semantic Segmentation

Hints

Start small. Get the architecture working before throwing lots of data at it.
Write generic code.
Start early. Expect the learning process for these models to exceed anything else that we have done in the class.

andrewhfagg -- gmail.com

Last modified: Thu Apr 22 10:51:24 2021