CS 5043: HW4: Re-Encoding Feature Spaces

Assignment notes:


Part 1: BMI Revisit

In HW3, you observed that LinearRegression performed very poorly on small data sets. This was due to the fact that we were trying to estimate 961 parameters using only ~2200 samples, a situation that is ripe for overfitting. You experimented with using regularization to address this issue. In this homework, we will project our data set into fewer dimensions before performing the regression.
  1. Using the first four MI folds, use the PCA class to construct a figure that shows fraction of variance explained as a function of the number of principal components.

  2. Implement a new regression class that extends LinearRegression and adds PCA projection ahead of the linear regression.

  3. Use the cross-validation mechanism that you implemented in HW 3 to show validation set performance as a function of data set size (focus on the smaller data sets). The parameter that you should vary across is n_components. Use your knowledge gained from the first step to select a reasonable set of possibilities here (plus, pick the extremes). Briefly discuss the results.


Part 2: Squirrelly Data Set

A new data set has been added (hw4.csv). This data set has 1000 samples and four columns (index, x0, x1, x2). Your goal is to predict x2 given x0 and x1 using the LinearRegression model.
  1. Compute a LinearRegression model that directly predicts x2 from x0/x1. Plot x2 and the prediction of x2 as a function of index. Discuss the quality of this prediction.

  2. Perform a non-linear expansion of the x0/x1 using a PolynomialFeatures object. Use a LinearRegression model to predict x2 given the representation of this expanded space. Plot your results for at least two different polynomial degrees. Discuss the quality of these predictions.

  3. Show a scatter plot of the x0/x1 space. What do you notice about this data set?

  4. Use an Isomap object to transform your input data from x0/x1 space to an N-dimensional space (where N is chosen appropriately).

  5. Fit a LinearRegression model to this new data representation. Show how it performs and discuss the result.

  6. Perform a non-linear expansion of this new representation using a PolynomialFeatures object. Use a LinearRegression model to predict x2 given the representation of this expanded space. Plot your results for at least two different polynomial degrees. Discuss the quality of these predictions.


andrewhfagg -- gmail.com

Last modified: Mon Feb 26 23:09:15 2018