Student: Andrew H. Fagg
Motor Skill Learning in Robots and Primates
In a laboratory situation, a primate learns to perform the task designated by the experimenter through a reward/penalty or reinforcement-based paradigm. This reinforcement information, however, is extremely sparse relative to all of the things the monkey must do in order to obtain a reward. Even with the simplest tasks (e.g. reaching to grasp a handle), a monkey has many different motor acts that are available, from which he must select some sequence. When a reinforcement signal is provided, he must somehow infer the critical elements of his actions that caused him to receive the reward, so that the actions may be repeated the next time that the same situation arises. Despite this very limited amount of information, the monkey is often able to learn the desired task.
Within the robotics domain, we find somewhat of a similar problem. Although it is possible to program the robot at a low level (controlling exactly what the robot does), it is typically very difficult to know exactly what to tell the robot in order to accomplish a task. This is especially the case, when the teacher's sensing and actuation capabilities differ significantly from that of the robot. We would therefore prefer to specify programs at a significantly higher level: one in which it is more natural for a programmer or a teacher to communicate. One possible approach to this problem draws inspiration from learning in monkeys, using reinforcement information to specify what the robot should do in a particular situation.
Two key problems that must be solved in order for reinforcement information to be useful in constructing motor programs are the structural and temporal credit assignment problems. In other words, given that the teacher provides some instantaneous reinforcement signal, the learning system must identify a) which computational elements (in our case neurons) were responsible for generating the actions that ultimately led to the reinforcement, and b) at what time did these elements make the critical decisions. One way to approach the structural credit assignment problem is through a distributed trial and error mechanism, as explored in our work on the modeling of primates learning to associate a visual stimulus with one of several motor responses (Fagg and Arbib, 1992). The temporal credit assignment problem is currently being investigated in the context of teaching an arm/hand system to reach for a target (Fagg, 1993), as well as in a mobile robot system that is to learn how to move about safely in its environment. One key aspect of these approaches is the use of prediction of future reinforcement as a way of producing an internal reinforcement signal as individual actions are being generated.
The manner in which motor programs are coded in a neural system can greatly affect the complexity and variety of the motor programs that can be stored, the efficiency with which new programs can be learned, and ease with which one motor program can be generalized to handle a slightly different situation. One aspect of this work examines the issue of different neural regions being involved in any particular computation and how their relative functions might work together to perform a task. It is of special interest as to how learning may occur at different levels within a control hierarchy. In other words, when a neural system is learning a new task, not only must it decide what must be learned, but at what level the new information needs to be encoded. In some cases, the low-level components of the controller for the new task are already in place, and it is only necessary for the higher-level to make adjustments to bind them together in a unique way. The work described in (Fagg, 1993) lays important groundwork for approaching these issues.
For any reasonably complex task, reinforcement information alone may not be enough to train either a monkey or a robot to become proficient at performing the task. Consider the case where the system is presented with a box and several small objects. If a naive system is expected to open up the box and place the green object in the box before it receives a reward (and otherwise gets no feedback), then it may require a very long period of performing random actions before the system discovers what it is that it must do.
One way that a teacher may approach this search space problem is through the use of staged learning. Rather than expecting the learning system to perform the task all at once, the teacher first sets a very simple goal to be learned, such as opening up the box. Once the system learns to perform a subtask adequately, the teacher presents a more difficult problem. In this way, the learning system is significantly restricted in its search for the correct motor program. This technique has already been explored to some degree in the development of gait generators for a walking robot (Lewis, Fagg, and Solidum, 1992), and will be a key component of the continuing work to model how monkeys learn in the laboratory and how robots could learn to perform useful tasks within a reasonable amount of time.
Robots have an advantage over monkeys in that it is possible for them to make use of other types of feedback information in learning how to perform a motor skill. Although other forms of feedback may provide very different types of information, the techniques for dealing with the sparseness, the inaccuracies, and the temporal delay of the feedback signal are just as applicable as those for dealing with reinforcement information. One such possibility is the use of training by example (explored to some degree in Fagg 1991). Consider an arm/hand robotic system that is posed with problem of learning how to pick up a variety of objects. It is possible for a teacher to give examples of picking up objects by teleoperating the arm/hand system through the movements required to approach the object and grasp it. If the robot simply learns to mimic the teacher's actions, the results may be clumsy and may not generalize well to picking up other objects. However, the teacher may continue to work with the robot, providing reinforcement-type information. As a result, the robot can tune the original motor program so that it makes better use of its own sensing and actuation capabilities. In addition, when presented with a novel object, the robot can make use of what it already knows about reaching and grasping as a starting point in the search for a motor program that can pick up this new object.
The specific goals for this thesis work are :
Robotics provides an environment in which models from the primate side may be tested, analyzed, and improved in agents that must also behave in a real environment. Ultimately, predictions that arise from these models may be brought back to the primate domain in the form of new experiments to be tried or as a better understanding as to how the primate system functions.