CS 5043: HW6: RNNs + CNNs

Assignment notes:

The Problem

Proteins are chains of amino acids that perform many different biological functions, depending on the specific sequence of amino acids. Families of amino acid chains exhibit similarities in their structure and function. For a new chain, one problem we would like to solve is that of predicting the family that it most likely belongs to. In this assignment, we will be classifying amino acid chains as one of eighteen families: PF01810, PF01925, PF02659, PF03824, PF16955, PF04955, PF11139, PF13386, PF13795, PF01169, PF01914, PF02673, PF02674, PF02683, PF03239, PF03596, PF03741 or PF19510.

Data Set

The Data set is available on SCHOONER: The data are already partitioned into five independent folds, with the four classes stratified across the folds (the samples for class k are distributed equally across the five folds). However, the different classes have different numbers of examples, with as much as a 1-10 ratio between the minority and majority classes.

Each example consists of:

There are two ways to load the data (provided in pfam_loader.py):

Both loaders return the same data set format (documented in pfam_loader.py)

Deep Learning Experiment

Objective: Create a neural network model that can predict the family of a given amino acid. We will compare a "simple" architecture with a "complex" architecture. The precise definition of these is up to you, but you should adjust hyper-parameters for each so that they can do their best (with respect to the validation set) without changing model architecture.

Notes:

Performance Reporting

Once you have selected a reasonable architecture and set of hyper-parameters, produce the following figures:
  1. Figure 0a,b: Network architectures from plot_model()

  2. Figure 1: Training set Accuracy as a function of epoch for each rotation. Include both models

  3. Figure 2: Validation set accuracy as a function of epoch for each of the rotations. Include both models

  4. Figure 3: Histogram of accuracy for the test folds that shows vertical lines that correspond to the average accuracy for each model type (also show this average in text).


Provided Code

In code for class:


What to Hand In

Turn in a single zip file that contains:

Grading

References


andrewhfagg -- gmail.com

Last modified: Mon Apr 11 16:35:16 2022