CS 5043: HW5: Tensor Flow

Assignment notes:

Deadline: Tuesday, March 27th @11:59pm.
Hand-in procedure: submit to the HW5 drop box on Canvas.
This work is to be done on your own. While general discussion about Python and Tensorflow is encouraged, sharing solution-specific code is inappropriate.
Do not submit zip or MSWord documents.

Background

The benefit to having one or more hidden layers in a neural network is that one can express very complicated continuous functions, whereas networks with no hidden layers are limited in the types of functions that they can represent. However, one of the challenges in adding hidden layers to networks is that, early in the training process, it can be difficult to overcome the vanishing gradient problem. One way to address this challenge, in part, is to break the serial nature of the layers. In other words, we can allow a single layer to receive input from more than one prior layer. For example, in a network containing one hidden layer, this layer will receive inputs from the input layer, but the output layer will receive inputs from both the hidden and input layers. If the output layer is a linear transformation, then the effective implementation is a linear function with a bit of non-linearity that can be recruited, as necessary.

Part 1a: Multi-Input Layers

The book presents an approach to constructing multi-layer networks in "raw" Tensorflow (p. 266-). Expand the neuron_layer() implementation to take as input two different input layers (or, if you are ambitious, an arbitrary number of input layers):

If an input layer is specified as None, then it should be ignored.
Create separate variables for each input to represent the weights.
Handling of biases and the non-linearity should remain the same.

Create a network building function that uses the new neuron_layer() to construct either a traditional 1-hidden layer network, or a modified one in which the output layer receives two inputs.

You can implement these different architectures as two different functions or as one.
Include in the graph an operation that computes FVAF for measuring performance.
The inputs and expected outputs should be implemented as Placeholders so any data set can be presented to your network and evaluated (this goes for both the training and validation data sets).
Use tf.train.MomentumOptimizer to perform your optimization.
The output layer should contain no non-linearity. However, hidden layers should include some form of non-linearity.

Experiments

Use a single training set and validation set (the latter should be just one fold). [Normally, we we do full N-Fold cross-validation here]
There are three different experimental variables (at least) that one could use to explore these different architecture forms: number of training folds, number of hidden units and the type of architecture (serial or overlapping). Choose two of three to vary. For each combination of parameters, show validation FVAF as a function of training epochs (one curve for each).
Focus on predicting shoulder torque as a function of spike count history.

Hints

FVAF = 1 - mse / variance
tf.nn.moments() will return the mean and variance of a random variable

Part 1b: Multiple Hidden Layers

Perform a similar analysis as above using 2 hidden layers.

Part 2: Regularization

With the introduction of weight regularization in Ridge Regression, we gained a lot in performance by adding a regularization term to the cost function. Modify your above network implementation so as to include a weight regularization term.

I found it easiest for neuron_layer() to return both the output tensor, as well as a tensor that represents the mean squared weight for any weight matrices that are included in the layer.
In construction of the full network, the mean squared weights are then combined with MSE to construct the full cost function (don't forget the regularization parameter!).

Experiments

Focus on a small training data set (1-2 folds) and the 1-hidden layer-with-dual-inputs configuration.
Show validation FVAF as a function of training epoch for several different choices of the regularization parameter.

Hints

This is a useful bit of code to have around:
```
# Reset everything
sess.close()
tf.reset_default_graph()
```
Executing it will close out the current session and reset the Tensorflow graph to be empty. After this, don't forget to create a new graph and to start and initialize a new session.
Go slow and confirm that small pieces are working before you move on to big pieces (the Tensorflow error messages are hard to interpret).

andrewhfagg -- gmail.com

Last modified: Mon Mar 12 22:24:26 2018