CS 5043: HW7: Semantic Labeling
Assignment notes:
- Deadline:
- Expected completion: Friday, April 30th @11:59pm.
- Deadline: Monday, May 3rd @11:59pm.
- Hand-in procedure: submit a pdf to Gradescope
- This work is to be done on your own. While general discussion
about Python, Keras and Tensorflow is encouraged, sharing
solution-specific code is inappropriate. Likewise, downloading
solution-specific code is not allowed.
- Do not submit MSWord documents.
Data Set
The Chesapeake Watershed data set is derived from satellite imagery
over all of the US states that are part of the Chesapeake Bay
watershed system. We are using the patches part of the data
set. Each patch is a 256 x 256 image with 29 channels, in which each
pixel corresponds to a 1m x 1m area of space. Some of these
channels are visible light channels (RGB), while others encode surface
reflectivity at different frequencies. In addition, each pixel is
labeled as being one of:
- 1 = water
- 2 = tree canopy / forest
- 3 = low vegetation / field
- 4 = barren land
- 5 = impervious (other)
- 6 = impervious (road)
- 15 = no data [NOTE: this is remapped by our loader to class zero]
Here is an example of the RGB image of one patch and the corresponding pixel labels:
Notes:
Data Organization
All of the data are located on the supercomputer in:
/scratch/fagg/chesapeake. Within this directory, there are both
train and valid directories. Each of these contain
directories F0 ... F9 (folds 0 to 9). Each training fold is composed
of 5000 patches (you will need at least 40GB of RAM to load one fold;
I don't recommend doing this on your home machine).
Local testing: the file /scratch/fagg/chesapeake_F0.tar.gz
contains the data for training fold 0 (it is 3GB compressed).
We will use the valid data set as a proper test set. You
should sample from the train directory for a proper validation
data set.
Within each fold directory, the files are of the form:
SOME_NON_UNIQUE_HEADER-YYY.npz. Where YYY is 0 ... 499 (all
possible YYYs occur in each fold directory. There are multiple files
with each YYY number in each directory (100 in the training fold
directories, to be precise). Because there are 5000 patches in each
training fold directory, we will generally be loading a subset of one
fold of these data at any one time.
Data Access
In the git repository, we provide a loader for files in one directory:
ins, mask, outs, weights = load_files_from_dir(file_base, filt)
where:
- file_base is the directory
- filt is a regular expression filter that specifies which
numbers to include.
- '-[1234]?' will load all 2-digit numbers starting with
1, 2, 3 or 4. For a training fold, this corresponds to
200 examples (enough to get started with).
- '-1[1234]?' will load all 3-digit numbers starting with
1 and having 1, 2, 3 or 4 as the second digit (also 200
examples).
- '-*' will load all 5000 examples.
Of the return values, ins and outs are
properly-formatted tensors for training / evaluating. ins is shape
(examples, rows, cols, chans) and outs is shape (examples, rows,
cols). Note that outs is an integer tensor that contains the class ID
of each pixel (values 0 ... 6) (it is not one-hot encoded for
efficiency reasons).
The Problem
Create an image-to-image translator that does semantic labeling of the
images.
Details:
- Your network output should be shape (examples, rows, cols,
class), where the sum of all class outputs for a single pixel
is 1 (i.e., we are using a softmax here).
- Use tf.keras.losses.SparseCategoricalCrossentropy as
your loss function. This will properly translate between your
one-output per class per pixel to the outs that have
just one class label for each pixel.
- Use tf.keras.metrics.SparseCategoricalAccuracy as an
evaluation metric. Because of the class imbalance, a model
that predicts the majority class will have an accuracy of ~0.72
- Try using a sequential-style model, as well as a U-net model.
Deep Learning Experiment
For what you think is your best performing model type (and
hyperparameters), perform 5 different experiments:
- Use '-*[01234]' for training (train files)
- Use '-*[89]' for validation (train files)
- Use '-*' for testing (valid files)
The five different experiments will use folds F0 ... F4 (so, no
overlap in training data sets; likewise for the validation and testing
datasets).
(details are subject to change)
Performance Reporting
- Figure: validation accuracy as a function of training epoch.
Show 5 curves.
- 5 figures: for each final model, evaluate using the test data set
and generate a confusion matrix for each of the models.
- Metric: average test accuracy across the models
What to Hand In
Hand in a PDF file that contains:
- Code for generating and training the network. Some useful UNIX
command line programs:
- enscript: translate code (e.g., py files) into postscript files
- ps2pdf: translate postscript files into pdf files
- pdfunite: merge several pdf files together
- The above figures
- The final metrics (note that this can be with respect to the
best performing epoch, as identified by EarlyStopping)
Grades
- 50 pts: Model generation code. Is it correct? clean? documented?
- 50 pts: Model figures and performance. We expect at least
something interesting in the confusion matrices and an average
test accuracy of 0.85.
- 10 pts: An average test accuracy of 0.9
References
Hints
- Start small. Get the architecture working before throwing lots
of data at it.
- Write generic code.
- Start early. Expect the learning process for these models to
exceed anything else that we have done in the class.
andrewhfagg -- gmail.com
Last modified: Thu Apr 22 10:51:24 2021