CS 5043: HW3: Comparing Regression Algorithms

Assignment notes:

Data Set and Goals

A brain-machine interface data set has been placed in: ~fagg/datasets/bmi/DAT6_08. This data set is from a single day's session with a monkey. While sitting in a chair, the monkey places her arm in an exoskeleton that restricts arm movements to the X-Y plane. For this data set, the exoskeleton is passive and is only used to record the monkey's arm movements. A series of virtual targets is then provided to the monkey. As the monkey's hand moves through the current target, a new one appears (so, we are observing a sequence of reaches). During this session, information is recorded every 50ms. The following information is collected and stored in their own files:

In the brain-machine interface context, after a data set such as this is collected, a model can be constructed that predicts from the neural data one or more properties that describe the state of the arm. Under some conditions, these models can be used to translate in real time the neural activity into control signals for the exoskeleton (hence, the monkey's arm will be moved for her).

Our goals for this homework are to assess:

Notes:

The following will load all folds for one file type and return a list of Numpy arrays:

import fnmatch
# File loading
def read_bmi_file_set(directory, filebase):
    '''Read a set of CSV files and append them together
    :param directory: The directory in which to scan for the CSV files
    :param filebase: A file specification that potentially includes wildcards
    :returns: A list of Numpy arrays (one for each fold)
    '''
    
    # The set of files in the directory
    files = fnmatch.filter(os.listdir(directory), filebase)
    files.sort()

    # Create a list of Pandas objects; each object is from a file in the directory that matches filebase
    lst = [pd.read_csv(directory + "/" + file, delim_whitespace=True).values for file in files]
    
    # Concatenate the Pandas objects together.  ignore_index is critical here so that
    # the duplicate row indices are addressed
    return lst


# Load the time stamps
time = read_bmi_file_set('/home2/fagg/datasets/bmi/DAT6_08', 'time_fold*')


Part 1: Data Exploration


Part 2: Cross-Validation

Because Scikit-Learn leaves something to be desired in its implementation of cross-validation, we will implement our own. The data set contains N=20 folds. For a given rotation, N-2 folds are available for training, one will be used for validation and one will be held out for testing.

We will be implementing three nested loops (I implemented these across two functions):

In addition, write a function that:

Notes:


Part 3: Analysis


Hints


andrewhfagg -- gmail.com

Last modified: Wed Feb 21 17:34:10 2018