Project Objectives
Develop complete, effective and scalable software for autonomous robot teams. Demonstrate robot teams with integrated perception, reasoning, learning, communication and cooperative strategies that solve complex multi-agent tasks.
Project Website
Approach
Under the MARS program, we continue to focus on the key challenges for building individually skilled autonomous robots and successful multi-robot teams to operate in uncertain, adversarial domains. These challenges range from single robot issues of localization, real-time visual sensing, robust tracking, high-speed navigation control, and automatic segmentation and recognition of environmental regimes through to multi-robot team issues of adaptive team strategy, robust cooperation through communication in the face of uncertainty, and team learning. All of these issues are important to the successful operation of multi-robot teams.
Recent Results
During the last quarter, we have been focusing on consolidating and extending our research results. In addition to documenting our work, in the form of research publications and presentations including the most recent funding meeting, we have also made a number of advances during the last quarter. In particular, we have made significant progress in:
Multi-Robot Learning
In our previous work under the MARS program, we developed a
theoretically and emprirically convergent multi-agent learning
technique called WoLF. WoLF, short for Win or Learn Fast, is a
multi-agent learning technique that can be generally applied to single
agent policy improvement algorithms. Its impact resides in the fact
that it enables an otherwise non-convergent, single agent learning
technique to be convergent and rationale in a self-play multi-agent
scenario.
We have now extended our algorithm, and incorproated many recent advances within the machine learning community, to arrive at a new algorithm we call Gra-WoLF. Gra-WoLF, short for Gradient-WoLF, utilizes linear function approximation, parameterized stochastic policy representations with policy gradient hill climbing (hence Gradient-WoLF), with the WoLF algorithm. Thus, through function approximation and a parameterized policy representation, it can be applied to large and/or continuous state spaces. With the WoLF algorithm, it can be used ina multi-agent learnign scenario with high confidence of convergent results.
We have applied the Gra-WoLF algorithm to a adversarial multi-robot task using our small-size robots (CMDragons) built under the MARS program. We call this task call KeepOut. In KeepOut, one robot defends a 30cm diameter circle, while the other starts from 1m away and attemts to enter the circle. If the attacking robot gets within the circle in under 10 seconds, it is the winner. If it does not get into the circle, then the defender wins.
KeepOut involves many of the aspects of multi-robot systems that make learning a challenging task. Due to non-negligible latencies, it is partially observable. That is, neither opponent knows what the other has done during the latency period, which in our system is 100ms. It is stochastic because of variations in robot parameters, variation in the actions of the robot in comparison to their commanded actions. As with other multi-agent problems it is non-stationary as the opponent is learning at the same time.
Through a combination of simulation trials for early learning (up to 2000 runs), followed by robot learning trials (again up to 2000 runs), our test scenarios demonstrate that the robots are able to learn good policies. We have now documented these results in a recent IJCAI publication (see the publications list below).
Generalized vision-based obstacle avoidance
We have extended and analyzed our 'visual sonar' vision processing
algorithm for general purpose obstacle avoidance for an indoor
robot. The vision algorithm uses our Sony AIBOs and color vision
library CMVision (
http://www.cs.cmu.edu/~jbruce/cmvision) developed earlier within
the MARS program. The algorithm operates on our Sony AIBO's at the
full frame-rate of 25Hz on a 200MHz MIPs processor. The algorithm
works by scanning through image space along search lines for
non-ground colored pixels. The scan lines are constructred using
projective camera geometry, a ground plane constraint, and knowledge
of the robot's head angle releative to the ground (using joint
sensors). Each scan line is a radial projection along the ground plane
from the robot center at a constant angular division (typically 5
degrees). Only the scan lines which are visible to the robot's camera
are searched.
The robot projects each non-ground object found in the image into world space and builds a local 2-D map of its surroundings. As the robot moves, it updates the points stored in this map using the same motion model as that used for localization. In our current implementations, the robot forgets data points after a set time period. Some memory is required due to the robot's limited field of view. One of the key features of our algorithm is its speed. Due to searches being restricted to scan lines, the processing time scales with the square root of the image size, thereby making it useful for even higher resolution images. We have now published our results in an IROS paper (see publication list below).
Automatic segmentation and recognition of environmental
regimes
Robot environments are typically quasi-stationary
systems. Being able to identify and recognize the different
environmental regimes or contexts offers the ability to extend the use
and dynamic range of many current algorithms significantly. We have
developed an algorithm, which we continue to refine and improve, to
automatically segment and recognize different sensory inputs
on-line. Given sensor noise and variability, this is a challenging problem.
Our approach relies on the assumption that different environment states are characterized by having different underlying probability distributions. Thus, by comparing sample distributions, we can segment environment states and recognize previously experienced environment states.
Our current approach uses a sliding window with a non-parametric statistical test (a modification of a Kolmogrov-Smirnov distance operator) to determine if the recent sensory data is drawn from the same distrubtion as previously segmented states, or is a new previously unseen state. From a control perspective, once the environment state is recognized, we can then use the control parameters that are appropriate to that state. This has the effect of broadening the effective dynamic range of our control algorithms.
We have applied our technique to a recognition task. Using a Sony AIBO robot looking at a static scene, we experimented with a range of lighting conditions. For each lighting condition we generated a set of color thresholds for segmenting the image, and stored a set of samples for identifying that environment state. We then modified the lighting conditions on the fly. Using the recognition system, the robot selected the thresholds to segment the image based on what it recognized as the environment state. In comparison to a non-adapting robot, our algorithm was able to allow the robot to recognize the objects in the image based on their segmented color robustly despite the lighting changes. We have now published this work as an IROS publication (see publication list).
Plan
Technology Transition
Our work, which is available at http://www.cs.cmu.edu/~coral/download/. Our publications are avilable at http://www.cs.cmu.edu/~coral/publications/. Our software includes:
Recent Relevant Publications