Solution to

Homework 1 - Introductory Topics

For this homework, consider the task of having a robot work with library staff to reshelve books in a library.

  1. Agent Environments

    1. Given the environment dimensions specified in your textbook, explain which properties are true of this task environment.

      • This environment is partially observable, rather than fully observable. A library is a large place so a robot will only be able to sense a part of that place at any given time. Moreover, some of what the robot cannot sense may be relevant to the robot's decision-making processes. Consider the following example: Books have specific locations on specific shelves to which they need to be returned. A robot that knew where each book went could plan an efficient route to return all the books in a given load, iff it knew which paths were clear. However, because some paths might be temporarily blocked (e.g., a library patron might have left a chair in the middle of an aisle) and the robot cannot observe that until it gets part way through its route, we must consider this environment to be partially observable.

        (To get full credit on this environmental dimension, you must discuss not only the fact that some of the environment cannot be sensed at any given time but also the fact that the missing information may be relevant.)

      • This environment is stochastic, rather than deterministic. There is a large stochastic element in this environment due to other agents. Both library patrons and staff may modify the environment by moving themselves and other objects in the environment, including (but not limited to) chairs, tables, doors, and books.

        However, this environment should not be considered strategic. Even excluding other agents, the next state is not "completely determined by the current state and the action executed by the agent" (p. 41) in question (the robot). One reason for this is that robots cannot be assured of doing what they are trying to do, particularly when they are acting in environments not specifically tailored to them. A robot trying to pick up a book may drop it, for instance. As those who have taken my Intelligent Robotics course know, even having a robot drive straight is not a simple task. A second reason is that additional outside forces may modify the environment. For example, illumination may change greatly as the sun rises and sets outside the library.

        (To get full credit on this environmental dimension, you must discuss not only the stochastic vs. deterministic dichotomy, but also the strategic middleground.)

      • This environment is sequential, rather than episodic. There are definitely some episodic elements to this environment. For example, the set of arm motions the robot makes to place one book back on its shelf will be largely independent of the sequence of arm motions it will use to place another book on another shelf in another aisle, given the fact that the arm needs to return to the stack (or rack) of books that the robot is bringing with it before it can pick up the next book. We could even think of reshelving one set of books and returning to the book drop for more as being an episode, given the common definition of that term. However, these types of episodes stretch the meaning of episodic given by the text, in which the agent is supposed to perform only "a single action" (p. 41) during an episode. Further, these episodes are not really independent of one another. The robot's observations during one reshelving "run" could (and probably should) influence its planned path for its next run, for one example; and the robot's power use to reshelve one book effects its power remaining to reshelve the next, for another example.

        (To get full credit on this environmental dimension, you need only find the sequential elements to this environment, not its episodic elements.)

      • This environment is dynamic, rather than static. As mentioned above, library patrons and staff may modify the environment by moving themselves and other objects in the environment, including (but not limited to) chairs, tables, doors, and books. Moreover, they may do this while the robot is deciding on its next action. There is no sense of "taking turns" - these other agents will act when they are ready to act, not when the robot is ready for them to do so. The other changes to the environment, such as changing levels of illumination, also do not wait for the robot.

        Whether the robot's performance score changes with the passage of time depends on how we evaluate its performance (see Question 2). Nonetheless, because the environment may change without the robot acting, we do not call it semidynamic.

        (To get full credit on this environmental dimension, you need to rule out both static and semidynamic, as well as confirming that it is dynamic.)

      • This environment is continuous, rather than discrete. Moreover, this environment is continuous in every aspect considered by the book.

        Both the state and time components of the environment are continuous in much the same way as the in the taxi domain considered by the textbook: The speed and location of the robot and of the other agents in the library sweep through a range of continuous values and do so smoothly over time, to paraphrase the text (p. 42). We could make similar claims for other components of the environment as well. Each book, for instance, has aspects such as its location, orientation, dimensions, colors, etc., that could best be described as having continuous values. (We might argue that some of these really have discrete values but, if so, they have too many possible discrete values to consider them individually. We really only care if there is a finite number of distinct states, which is not the case in this environment.)

        Both the percepts and the actions of the agent should be considered continuous as well. While we have not specified the sensory systems of this robot, we know they must be sophisticated enough to allow for manipulation of the books, to allow the robot to avoid collisions with both inanimate and moving obstacles, etc. Such sensors are most likely to have continuous values. To pick up a book, for example, is likely to require more knowledge of its location and orientation than simply "in front and upright" (or other such selections from finite lists of discrete values). Similarly, while at some level actions may be considered discrete (e.g., "pick up book" might be a discrete action), in order to actually carry out these actions, the robot will need to provide continuous values to its actuators (e.g., apply torque T to joint 1, where T is a continuous value).

        (To get full credit on this environmental dimension, you need to discuss what aspects of the environment you consider to be continuous and which discrete.)

      • This environment is multi-agent, rather than single agent. While we might consider the library patrons to be non-agents from the standpoint of reshelving the books (the chance that a particular patron wants one of the few books the robot is currently reshelving is quite small), we definitely need to consider the library staff as agents. The problem description clearly states that the robot will "work with library staff to reshelve books" - that is, they are both trying to maximize some performance measures relating to this task.

        (To get full credit on this environmental dimension, you need to discuss why any particular entity should be considered an agent by the robot.)




    2. Given your answers to the previous part, explain why making robots function outside highly constrained environments (such as factories) is difficult in general.

    3. As this example shows, for all these dimensions to possible agent environments, using a robot outside a highly constrained environment is likely to give you the hardest of all possible agent environments.

      A partially observable environment is harder to deal with than a fully observable environment, since memory is required to perform well. Robots are likely to need memory since they are unlikely to be able to observe all of their environment at once.

      A stochastic environment is harder to deal with than a deterministic one, since we cannot fully predict future states. Robots are unlikely to be in deterministic environments because their own actuators cannot be guaranteed to have particular effects in the world - unless that world is highly constrained.

      A sequential environment is harder to deal with than an episodic one, since we cannot discount our previous decisions in making our current one. Robots outside highly constrained environments are likely to find themselves in sequential environments, since there is nothing to "reset" them environment for them, in order to start over again.

      A dynamic environment is harder to deal with than a static one, since late answers may be wrong answers. Robots are likely to find themselves in dynamic environments, unless they are kept to factories or other highly constrained environments, because the world is dynamic in general.

      A continuous environment is harder to deal with than a discrete environment, since there are not simple lists of possibilities for each aspect of the environment. Rather, we must either find functions that relate real values to one another or ways to partition the environment in such a way as to treat sets of values the same. Robots themselves tend to require continuous input and output, except in very limited circumstances.

      Finally, a multi-agent environment is harder to deal with than a single-agent environment, since you need to be able to predict the performance of the other agent(s) in order to do well. Even if a robot is trying to complete its overall task without help or interference by other agents, outside of highly constrained environments there are likely to be subtasks that are cooperative or competitive. For example, if the robot and a person both want to get through a doorway, it may help to know that the person may back up and get out of the way if he thinks that will get him through the doorway more quickly than if he waits for the robot to back up.

      (To get full credit on this question, you need to discuss why one option is more difficult than the other for each dimension and why a robot that functions outside a highly constrained environment is likely to involve the more difficult option.)


  2. Agent Performance

    1. One possible performance measure for this task is the number of books reshelved per day, where higher is better. Explain whether you believe this is an appropriate performance measure for this task.

    2. This probably is an appropriate performance measure, although it is ambiguous, so how you interpret it (or have the robot interpret it) is important. For example, if you only consider the number of times the robot puts a book onto a shelf, without considering the number of times the robot takes a book off a shelf, then the robot could maximize its performance using this measure by repeatedly taking a book off a shelf and immediately putting it back, as fast as it can all day. This is not really helpful behavior, however. Similarly, if you don't include in this measure the need to put the books on the right shelves in the right places, the robot could maximize its performance using this measure by putting the books on the first open shelf it can find. Again, however, this is not particularly helpful.

      Even ignoring such foolish interpretations of this performance measure, this performance measure may lead to behavior that we would not consider particularly intelligent. Suppose, for example, that the robot has a dozen books to return to the shelves, one of which belongs at the far end of the library and the rest on a shelf close by. It would probably make most sense to first reshelve the eleven books that go nearby, then return the one book that goes at the far end of the library. This makes more sense from the viewpoint of library patrons because it means that more books will be put back on the shelves sooner, thereby increasing the probability that a book someone is seeking is on its shelf waiting to be found, rather than being carted around by the robot. It also makes more sense from the perspective of power consumption, since it wastes power to haul books for longer distances than necessary. Despite these facts, however, the given performance measure would not give preference to this reshelving order over one in which the robot reshelved the distant book first, hauling all eleven others along for this long ride.

    3. In a particular library, the east elevator is generally used by library patrons and the west elevator is generally used by library staff. Since the robot is helping the library staff, we might want it to generally use the west elevator as well. Therefore, another possible performance measure for this task is the number of times per day that the robot takes the west elevator, where higher is better. Explain whether you believe this is an appropriate performance measure for this task.

    4. This probably is not an appropriate performance measure. While the intent may be good - to generally keep the east elevator available for use by library patrons - asking the robot to maximize the use of the west elevator is likely to result in behavior we would not consider intelligent or useful. For example, if this were the only performance measure used, the robot could maximize it performance by simply riding the elevator all day. Even in combination with other performance measures, it is likely to cause the robot to make wasteful trips on the elevator, riding it when it has nothing better to do.

      This is an example of the the textbook's general rule that "it is better to design performance measures according to what one actually wants in the environment, rather than according to how one thinks the agent should behave" (p. 35). If what we want is to keep the east elevator available for patrons, we should penalize the robot for using it, particularly when patron demand for it is likely to be high, rather than rewarding the robot for using the west elevator. We'll have to be careful, of course, that our penalty for using the east elevator is not so large as to keep the robot from ever using it, as there may be cases when it should (e.g., when the west elevator is broken).

    5. Specify and explain an appropriate performance measure for this task, different from the two given above.

    6. Based on the discussion under the first performance measure, an appropriate performance measure would be to minimize the time that books are waiting to be reshelved. This is appropriate because it is directly related to what we actually want in the environment: To have the books back on the shelves quickly. It should probably not be our only performance measure, however, since we also care about things like patrons and staff being safe (not being run over by a speeding robot).

    7. Explain which performance measure (yours or one of the ones given above) you believe is better.

    8. This new performance measure is better than either of the two given in the assignment. It doesn't suffer from the long-wait time anomaly that the "books reshelved per day" measure does (as per the discussion above) and it is appropriate, unlike the "west elevator rides per day" measure.

    (For all the answers in this part, the explanation is the most important part.)


  3. Agent Programs

    1. Given what you have said about the reshelving robot agent's environment and performance measures above, explain which of the four basic agent program types you find most appropriate for this agent.

    2. The utility-based agent type is most appropriate for this agent. We are trying to maximize some combination of performance measures, not simply reach equivalent goals, so the goal-based agent is not as appropriate. Further, because the environment is only partially observable, a simple reflex agent would probably not do particularly well. Finally, while we might be able to create a model-based agent for this task, it would be extremely difficult. This is because the condition-action rules would need to be extremely complex to handle all the possible situations in which the robot could find itself.

    3. Explain whether you would add learning to this agent.

    4. I would add learning to this agent. Not only could this help us with the science of AI (testing out theories of learning on an embodied agent), but it is likely to improve the agent's performance in the long term. While much information about the environment can be given to the robot initially (e.g., the layout of the stacks in the library), there is much useful information that would be unlikely to be available initially (e.g., the likelihood that patrons will get out of the robot's way). As the book points out, learning can allow "an agent to operate in initially unknown environments and to become more competent than its initial knowledge alone might allow" (p. 51).

    (For all the answers in this part, the explanation is the most important part.)