A Control Projection Approach
to Learning From Teleoperator Demonstration

Andrew H. Fagg
Michael Rosenstein
Robert Platt, Jr.
Roderic A. Grupen

Abstract

The remote teleoperation of robots is one of the dominant modes of robot control in applications involving hazardous environments, including space. Here, a user is equipped with an interface that conveys the sensory information being collected by the robot and allows the user to command the robot's actions. The difficulty with this form of interface (even with the high-fidelity telepresence-style interface) is the degree of fatigue that is experienced by the user, often within a short period of time. To alleviate this problem, we would like our robot control system to anticipate the actions of the user, aid in the partial performance of the task, or even learn how to perform entire tasks autonomously. In order to accomplish this, it is critical that our control systems begin to develop deep representations of teleoperator action. We propose the use of a set of controllers as a central mechanism in the recognition process. Each controller is parameterized with a hypothesized objective (e.g., a goal) that is derived from a representation of the objects that surround the robot and how the robot might interact with these objects. The movement commanded by the teleoperator is compared against the those that would have resulted from the execution of the hypothesized controllers. The controller that best matches the observed movement is considered to be the explanation for the intended movement. This technique allows us to extract a high-level representation of the sequence of actions taken by the user, and even to anticipate the target of an ongoing reaching action.


The Short Story

A user demonstrates a sequence of pick-and-place operations through a teleoperation interface:

sequence_learn_v2_demo.mov

sequence_learn_v2_demo.mp4

sequence_learn_v2_demo.avi

sequence_learn_v2_demo_small.avi

Prior to the user demonstration, the control system enumerates the different grasping actions that can be used for each object in the workspace. Each action is expressed in terms of the parameterization of a controller instance (in particular, with the goal of the reaching movement). The movements produced by the user are then compared against the hypothetical actions of each of the controllers. Through this control projection technique, the controllers become action-oriented filters of the movement trajectory, allowing for the segmentation of the robot's movements into discrete subgoals. In this example, the extracted sequence is: pick up the blue ball; place it on the pink target, pick up the yellow ball, and place it on the orange target.

This same plan can now be executed automatically for a novel situation, as shown below:

sequence_learn_v2_D.mov

sequence_learn_v2_D.mp4

sequence_learn_v2_D.avi

sequence_learn_v2_D_small.avi


The Longer Story

Prior to demonstration of the pick-and-place sequence, the control system extracts a coarse object representation from the following stereo image pair (including position, shape, size, and color). In this case, there are four objects of interest in the workspace (two balls and two flat target regions).

For each object, the control system hypothesizes a set of appropriate grasping actions in the form of a parameterized controller instance. For this example, one controller is hypothesized for each object that involves a top-down approach to the object. The error metric for each controller (in this case) constrains 3 DOFs of position and 2 DOFs of orientation (leaving an unconstrained orientation DOF about the Z axis of the global coordinate frame).

The user-driven demonstration is as follows:

sequence_learn_v2_demo.mov

sequence_learn_v2_demo.mp4

sequence_learn_v2_demo.avi

sequence_learn_v2_demo_small.avi

The magnitude of the finger force vector (blue) and the joint velocity vector (black) are shown below. Note the user's movements are far from smooth, but that the pick-up and drop actions are salient in the finger force data stream.

The errors for each controller as a function of time are shown below. Note that "good" is down. In this example, the sequence of four controllers is visible in the temporary drop in error for a single controller.

The recognized plan is expressed using the colored bars; stars indicate pick-up events; circles indicate drop events. Note that this same sequence of controllers can now be used to perform the same actions automatically.


Novel Object Configuration

Now we present the robot with a novel configuration of objects. Note the distractor object in the upper-right of both images.

A new plan is generated by first assigning roles for each of the new objects in the already-acquired plan. Role assignment is accomplished by comparing object properties across the two images (using color and size). The assignment for this case is shown in the following figure (new configuration is on the left; the original configuration is on the right).

Execution of the modified plan is shown in the following movie:

sequence_learn_v2_D.mov

sequence_learn_v2_D.mp4

sequence_learn_v2_D.avi

sequence_learn_v2_D_small.avi


Novel Object Configuration II

The stereo pair is as follows:

Note the two yellow balls in this case. During the role assignment process, only the best matching of the two is selected:

Execution is shown in the following movie.
sequence_learn_v2_C.mov

sequence_learn_v2_C.mp4

sequence_learn_v2_C.avi

sequence_learn_v2_C_small.avi


A More Complex Sequence

In this demonstration, we have four pick-and-place operations.
sequence_learn_v3_demo.mov

sequence_learn_v3_demo.mp4

sequence_learn_v3_demo.avi

sequence_learn_v3_demo_small.avi

The observed finger force (blue) and joint velocity magnitude (black) are as follows:

The controller errors are:

The controller errors and resulting plan are below. Note that several extraneous actions have been filtered out of the plan. These correspond to cases where the teleoperator approached other targets along the path to the subgoal target.

This plan reads: pick up the yellow ball (cyan bar), place the object down on the orange target (red bar), pick up the blue ball (green bar), place it on the original location of the yellow ball (cyan bar), pick up the yellow ball from the orange target (red bar), place it on the pink target (dark blue bar), and then pick up the blue ball from the original location of the yellow ball (cyan bar) and place it on the orange target (red bar).


Novel Scenarios

Execution of the corresponding plan for two novel scenarios are shown below:

Execution of Extracted Plan

sequence_learn_v3_A.mov

sequence_learn_v3_A.mp4

sequence_learn_v3_A.avi

sequence_learn_v3_A_small.avi

Execution of Extracted Plan II

sequence_learn_v3_D.mov

sequence_learn_v3_D.mp4

sequence_learn_v3_D.avi

sequence_learn_v3_D_small.avi


fagg at cs.umass.edu

Last modified: Tue Mar 9 22:52:28 2004