CS 5043: HW6: Deep Policy Gradient

Assignment notes:

Problem

We are going to set up a RL-agent to learn to solve the MsPacman-v0 problem. In this problem, a single observation is an image of the play field (230 x 160 x 3). Actions are one of : NoOp, Up, Right, Left, Down, Up-Right, Up-Left, Down-Right and Down-Left (a total of 9). Positive rewards are given every time Ms Pacman consumes the small pellets in the environment, the larger power-up pellets or ghosts (only when powered-up). If Ms Pacman is caught by a ghost while she is not powered-up, she will lose a life. Once three lives are lost, the game is terminated. The goal is to accumulate as much reward as possible before the three lives are expended.

Implementation

The code that I provide is a full implementation of Policy Gradient with images as inputs (there is also an image implementation of Q-Learning available). When you create your agent and initialize your model, you can specify a range of parameters, including:

Here is a minimalist network/agent configuration:

# One Conv layer with max pooling (and striding)
conv_layers=[{'filters': 10, 'kernel_size': (5,5), 'pool_size': (5,5), 'strides': (2,2)}
            ]
# One dense layer
dense_layers=[{'units': 40}
             ]

# Configure the agent
sh = env.observation_space.shape

agent = myImagePolicyGradientAgent(sh, env.action_space.n, 
                    epsilon=0.1, lrate=.0005, maxlen=2000, gamma=0.4)
agent.build_model(conv_layers=conv_layers, dense_layers=dense_layers,
                    lambda_l2=.0001)

You can also specify the same architecture (and other experiment parameters) at the command line:
python hw6_basis.py -vv -env MsPacman-v0 -pg -ntrials 100 -results_path results_hw6 \
-conv_size 5 -conv_nfilters 10 -pool 5 -pool_stride 2 -hidden 40 -lrate .0005 \
-replay 1000 -maxlen 2000 -steps 500 -action_repeat 4 -rotation 0 -epsilon .1  -l2 .0001

(note that the '\'s are not part of the command line)


Hints / Notes


What to Hand In

Post important observations, results, code changes to the Canvas HW6 discussion board.

Grades

This homework is optional. If you make 3 interesting contributions on the discussion board, you will receive full credit.


andrewhfagg -- gmail.com

Last modified: Thu Apr 9 14:25:34 2020