CS 5043: HW5: Tensor Flow
Assignment notes:
- Deadline: Tuesday, March 27th @11:59pm.
- Hand-in procedure: submit to the HW5 drop box on Canvas.
- This work is to be done on your own. While general discussion
about Python and Tensorflow is encouraged, sharing
solution-specific code is inappropriate.
- Do not submit zip or MSWord documents.
Background
The benefit to having one or more hidden layers in a neural network
is that one can express very complicated continuous functions, whereas
networks with no hidden layers are limited in the types of functions
that they can represent. However, one of the challenges in adding
hidden layers to networks is that, early in the training process, it
can be difficult to overcome the vanishing gradient problem.
One way to address this challenge, in part, is to break the serial
nature of the layers. In other words, we can allow a single layer
to receive input from more than one prior layer. For example, in a network
containing one hidden layer, this layer will receive inputs from the
input layer, but the output layer will receive inputs from
both the hidden and input layers. If the output
layer is a linear transformation, then the effective implementation is
a linear function with a bit of non-linearity that can be recruited,
as necessary.
Part 1a: Multi-Input Layers
The book presents an approach to constructing multi-layer networks in
"raw" Tensorflow (p. 266-). Expand the neuron_layer()
implementation to take as input two different input layers (or, if you
are ambitious, an arbitrary number of input layers):
- If an input layer is specified as None, then it should be
ignored.
- Create separate variables for each input to represent the
weights.
- Handling of biases and the non-linearity should remain the same.
Create a network building function that uses the new neuron_layer() to
construct either a traditional 1-hidden layer network, or a modified
one in which the output layer receives two inputs.
- You can implement these different architectures as two
different functions or as one.
- Include in the graph an operation that computes FVAF for
measuring performance.
- The inputs and expected outputs should be implemented as
Placeholders so any data set can be presented to your
network and evaluated (this goes for both the training and
validation data sets).
- Use tf.train.MomentumOptimizer to perform your optimization.
- The output layer should contain no non-linearity.
However, hidden layers should include some form of
non-linearity.
Experiments
- Use a single training set and validation set (the latter should
be just one fold). [Normally, we we do full N-Fold
cross-validation here]
- There are three different experimental variables (at least)
that one could use to explore these different architecture
forms: number of training folds, number of hidden units and the
type of architecture (serial or overlapping). Choose two of
three to vary. For each combination of parameters, show
validation FVAF as a function of training epochs (one curve for
each).
- Focus on predicting shoulder torque as a function of spike
count history.
Hints
- FVAF = 1 - mse / variance
- tf.nn.moments() will return the mean and variance of a random
variable
Part 1b: Multiple Hidden Layers
Perform a similar analysis as above using 2 hidden layers.
Part 2: Regularization
With the introduction of weight regularization in Ridge
Regression, we gained a lot in performance by adding a
regularization term to the cost function. Modify your above
network implementation so as to include a weight regularization
term.
- I found it easiest for neuron_layer() to return both the output
tensor, as well as a tensor that represents the mean squared
weight for any weight matrices that are included in the layer.
- In construction of the full network, the mean squared weights
are then combined with MSE to construct the full cost function
(don't forget the regularization parameter!).
Experiments
- Focus on a small training data set (1-2 folds) and the 1-hidden
layer-with-dual-inputs configuration.
- Show validation FVAF as a function of training epoch for
several different choices of the regularization parameter.
Hints
- This is a useful bit of code to have around:
# Reset everything
sess.close()
tf.reset_default_graph()
Executing it will close out the current session and reset the
Tensorflow graph to be empty. After this, don't forget to
create a new graph and to start and initialize a new session.
- Go slow and confirm that small pieces are working before you
move on to big pieces (the Tensorflow error messages are hard
to interpret).
andrewhfagg -- gmail.com
Last modified: Mon Mar 12 22:24:26 2018