Teaching Intelligent Vehicles to Drive in Traffic:
Tactical-level Scenarios

Abstract

Scenario generation is typically used to create realistic traffic for human subjects in a driving simulator. However, scenarios can also play an important role as environments for teaching intelligent vehicles how to navigate in traffic.

SAPIENT is a distributed reasoning system which makes tactical-level driving decisions in a simulated highway environment (SHIVA). It consists of a number of autonomous, local experts, known as reasoning objects which monitor traffic entities in the environment: lanes, vehicles, and exits. Each reasoning object observes the traffic scene through one or more perception modules (corresponding to actual systems designed for the Carnegie Mellon Navlab robot vehicles), such as lane-trackers and obstacle detection modules.

While different reasoning objects may use different internal representations and techniques for making decisions, ultimately all reasoning objects express their preferences as a set of votes and vetoes, distributed over some tactical-level action space. These votes are then combined by a knowledge-free arbiter, and then executed by operational-level systems on the intelligent vehicle (an interface which is consistent with the Navlab vehicles).

This distributed architecture has several advantages over traditional, monolithic systems. First, the loosely-coupled framework encourages an incremental development of tactical-level reasoning systems. Second, it does not suffer from the explosion of states and interactions observed for single-layer FSMs. Lastly, it enables hybrid approaches to the tactical driving problem, with different techniques being applied to different sub-tasks. For example, an obstacle avoidance reasoning object can use Generalized Potential Fields, while an exit finder may employ decision trees.

However, the lack of a unified representation leads to complications: SAPIENT's performance relies on the interactions between several reasoning objects, each of which contains a number of adjustable parameters. This parameter tuning is tedious to perform for humans, particularly when they are unfamiliar with the internals of various reasoning objecs. Our solution is to automatically adjust these parameters using an evolutionary algorithm known as PBIL (Population Based Incremental Learning).

The learning technique may be summarized as follows. PBIL generates a bit-string which is interpreted as a set of parameters for SAPIENT reasoning objects. A vehicle with these parameters is instantiated in the simulated environment, and tested on a selection of tactical-level scenarios. An evaluation function assesses the performance of each intelligent vehicle by recording a weighted sum of observables such as the number of collisions, deviations from desired speed and frequency of missed exits. The initial parameters generated by PBIL perform poorly, but over time, PBIL begins to generate high-scoring bit-strings with a high probability. See the Histogram of the evaluation function over successive generations.

This is a particularly difficult learning problem since the simulated environment is stochastic in nature, and because a good set of parameters is not a guarantee of success --- it is possible for a good car to enter into a collision due to bad drivers in the environment. Several techniques are used to bootstrap this learning. First, only a small number of untrained vehicles is present in a given scenario (so the other vehicles behave reasonably). Second, a series of scenarios of varying difficulties are used -- thus the learning vehicles are exposed to a variety of situations. Third, each parameter string is evaluated multiple times on each scenario, and the results averaged (to compensate for the stochastic environment).

SAPIENT's design flexibly combines learning with domain-specific knowledge. The intelligent vehicles are not forced to learn how to drive from scratch but they are able to modify their behavior to achieve higher-level goals. The choice of training scenarios and evaluation functions allows researchers to breed intelligent vehicles that are successful at different aspects of the tactical driving task. In the future, we hope to test SAPIENT vehicles on the Navlab robots in (controlled) real-life situations.

Some views of a tactical-level scenario

Acknowledgements

Much of this work was done in collaboration with John Hancock and Shumeet Baluja. I am also grateful for the valuable insights provided by my advisors, Chuck Thorpe & Dean Pomerleau. This work was partially supported by the Automated Highway System project, under agreement DTFH61-94-X-00001.

Rahul Sukthankar (rahuls@ri.cmu.edu)
Last Updated: Nov 21, 1996 by rahuls@postbox.ius.cs.cmu.edu