Reliability of Mobile Robot Teams

Overview

These tools will enable mission designers to choose the lowest-cost configuration of robot components and robot teams to provide a required probability of mission success.

They will also allow for multirobot mission planning which takes reliability into consideration when assigning tasks to robots, in order to minimize the negative consequences of robot failures.

We are specifically addressing multirobot missions for planetary exploration, but the methods we are developing should have relevance for many robotic applications.

Many of the most promising applications for mobile robots are those which reduce or eliminate the need for humans to perform tasks in dangerous environments. Examples include space exploration, mining, and toxic waste cleanup. For mobile robots to succeed in keeping humans from these dangers, these robots must be highly reliable so that people do not have to enter the dangerous area to repair or replace failed robots.

Unfortunately, most current mobile robots have poor reliability, requiring frequent maintenance and repair. Historical failure data for small field robots reveals that they are either broken or under repair approximately half of the time [1].

Notable exceptions to this observation are the planetary rovers built and operated for NASA by the Jet Propulsion Laboratory (JPL). The current Mars Exploration Rovers (MER), for instance, have now been in operation on Mars for more than four years. There are few, if any, other mobile robots that have operated for as long as a year without repair.

The MER rovers are not, however, an unqualified success from the standpoint of reliability, since their intended mission was for 90 days. If the MER rovers had been designed to a level of reliability more closely suited to their intended mission, substantial savings in design and operational costs would have resulted.

Both of these situations—the unreliability of mobile robots in general and the high cost of over-reliable NASA rovers—illustrate that mobile robots are not being designed to an appropriate level of reliability. In fact, most of the time mobile robots are not designed with respect to reliability at all.

Our goal in this research is to provide analytical methods for considering reliability in the design of robots, robot teams, and robot missions.

Related Work

While there is extensive prior work which describes reliability prediction for hardware systems, and some work in reliability prediction for software systems there is little evidence in the robotics literature to indicate that these methods are known or used by the robotics community. When reliability is mentioned in the robotics literature it is usually in passing and in qualitative terms, along the lines of “This experiment was not completed because the robot kept breaking down.”

Prior to our work there were a handful of papers addressing reliability of mobile robots. Carlson, Murphy, et. al. have published several papers [1-5] analyzing the reliability of robots in field and laboratory situations. While they make use of reliability engineering for analysis and classification of failures, these papers do not provide a means for predicting failures of robots or robot missions.

In the multirobot literature, there is considerable work which examines how to diagnose and/or recover from robot failures, but only one prior paper [6] which provides methods to predict probabilities of failure before it occurs rather than to respond to failure after it occurs. While this work is similar to ours in applying methods from the reliability engineering literature to predict mobile robot failures, it is limited in scope, being concerned with the reliability of teams of robots which have cannibalistic repair capabilities. Our work instead aims to provide a broad framework for analysis of a wide variety of robotic missions.

Research Goals

To provide methods for quantitatively predicting the reliability of robots and robot teams;
To provide mission-analysis tools for examining the tradeoffs between reliability and other design factors (such as cost, team size, and mission duration);
and
To test the hypothesis that it is better to consider reliability a priori in multirobot task allocation, rather than dealing with failure only after it occurs.

Robot Reliability

More recent work [8] extends these methods into the multirobot domain. As is often the case, what is a fairly simple analysis in the single-robot domain becomes much more complicated when applied to robot teams.

The basic methods used in reliability engineering for combining subsystem reliabilities to determine system reliabilities assume independence of the subsystems with respect to reliability, but in a robot team the reliabilities of the team members are highly interdependent, with the failure of one robot affecting the tasking of other robots, and thus their reliabilities, in a way that is more complicated than a simple redundancy-of-parts calculation.

For simple missions these interdependencies can be enumerated, but for nontrivial missions the problem suffers from combinatorial explosion, so we make use of Monte Carlo simulation to estimate the probabilities of completing mission tasks.

Design Tradeoffs

Reliability engineering, however, is about designing devices with an appropriate level of reliability for the mission, and about trading off reliability against other performance metrics.

One of the reasons that reliability is particularly interesting in the multirobot domain is because claims have been made in the literature about the superiority of large teams of robots with respect to reliability. e.g.,

Multiple redundant robots provide more reliable solutions to real-world tasks than 
a single agent because the overall system is less sensitive to failure.[9]

On the surface of it, this seems like a reasonable claim. If two robots are sent to complete a task instead of one, then there is a greater chance of completing the task. When considered in more depth, however, this claim is more complicated than it initially seems: By sending two robots instead of one, the cost of completing the task has roughly doubled. What if those additional funds were instead spent on improving the single robot? Which is more likely to complete the task—two lower-reliability robots, or one higher-reliability robot? The answer is no longer obvious.

In [10] we examine this design problem of trading off robot team size vs. mission reliability. Our results show that, at least for the example mission analyzed, larger teams of less-reliable robots can provide higher mission reliability at a lower cost than smaller teams of more-reliable robots.

In [11] we take a more detailed look at the relationship between robot reliability and mission cost for an example planetary rover mission. Robot reliability affects more than just development costs - transportation costs are increased for more-reliable robots (due to increased weight and volume) - operational costs are increased for more-reliable robots (due to mission extensions) - and expected mission rewards are increased for more-reliable robots (due to increased chance of mission completion).

By combining these costs and rewards into an overall expected mission value, we can examine the cost-reliability tradeoff for a given mission. A plot showing such a cost-reliability analysis is shown below. In this plot we see that there is an optimal reliability range in the range of 75-85% for this particular mission, which is considerably lower than the level of reliability targeted by current legacy rover designs.

These results indicate that from a cost-benefits standpoint, it may be better to build rovers of lower reliability, and to accept that some of them will fail before completing their primary missions.

Mission Planning

Existing multirobot planners, if they consider the possibility of robot failure at all, consider it only after the fact—by selecting a backup plan after robot failure occurs.

Our hypothesis is that robot failure probabilities need to be considered a priori in developing the original plan. An abstract example—it is bad to assign a robot with a high chance of failure to a mission-critical task located far from other robots.

Our preliminary results indicate that for a simple exploration mission this is indeed the case. Ignoring reliability when generating the initial plan often leaves the surviving robots in a suboptimal position with respect to the backup plans that are followed after a robot fails.

These results will be reported in an upcoming publication.

References

J. Carlson and R. Murphy. “Reliability analysis of mobile robots,” Proc. ICRA, 2003.
J. Carlson, R. Murphy, and A. Nelson, “Follow-up analysis of mobile robot failures,” Proc. ICRA, 2004.
Carlson, J. & Murphy, R. “How UGVs physically fail in the field,” IEEE Transactions on Robotics, Vol. 21, No. 3, June 2005, pp. 423-437.
M. Micire, “Analysis of the robotic-assisted search and rescue response to the world trade center disaster,” M.S. thesis, University of South Florida, 2002.
J. Carlson, “Analysis of How Mobile Robots Fail in the Field,” M.S. thesis, University of South Florida, 2004.
C. Bererton, P. Khosla, “An analysis of cooperative repair capabilities in a team of robots,” Proc. ICRA, 2002, pp. 476-482.
S. Stancliff, J. Dolan, and A. Trebi-Ollennu, “Towards a Predictive Model of Mobile Robot Reliability,” CMU tech. report CMU-RI-TR-05-38, August, 2005
S. Stancliff, J. Dolan, and A. Trebi-Ollennu, “Mission Reliability Estimation for Repairable Robot Teams,” Int. Journal of Advanced Robotic Systems, June, 2006.
R. Brooks and A. Flynn, “Fast, Cheap and Out of Control: A Robot Invasion of the Solar System”, Journal of the British Interplanetary Society, Oct. 1989, pp. 478?485.
S. Stancliff, J. Dolan, and A. Trebi-Ollennu, “Mission Reliability Estimation for Multirobot Team Design,” Proc. IROS, 2006.
S. Stancliff, J. Dolan, and A. Trebi-Ollennu, “Planning to Fail - Reliability as a Design Parameter for Planetary Rover Missions,” Proc. PerMIS, 2007.

Publications

Planning to Fail - Reliability as a Design Parameter for Planetary Rover Missions
S. Stancliff, J. Dolan, and A. Trebi-Ollennu
Proceedings of the 2007 Workshop on Measuring Performance and Intelligence of Intelligent Systems (PerMIS '07), August, 2007.
Mission Reliability Estimation for Multirobot Team Design
S. Stancliff, J. Dolan, and A. Trebi-Ollennu
Proceedings of the 2006 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS '06), October, 2006.
Mission Reliability Estimation for Repairable Robot Teams
S.B. Stancliff, J. Dolan, and A. Trebi-Ollenu
International Journal of Advanced Robotic Systems, Vol. 3, No. 2, June, 2006, pp. 155 - 164.
Mission Reliability Estimation for Repairable Robot Teams
S.B. Stancliff, J. Dolan, and A. Trebi-Ollennu
Proceedings of the 1st International Workshop on Multi-Agent Robotic Systems - MARS 2005, Peter Sapaty and Joaquim Filipe, eds., INSTICC Press, Portugal, September, 2005, pp. 144 - 151.
Planning to Fail: Mission Design for Modular Repairable Robot Teams
S.B. Stancliff, J. Dolan, and A. Trebi-Ollennu
Proceedings of the 8th International Symposium on Artificial Intelligence, Robotics and Automation in Space (ISAIRAS 2005), B. Battrick, ed., ESA Publications Division SP-603, September, 2005.
Towards a Predictive Model of Mobile Robot Reliability
S.B. Stancliff, J. Dolan, and A. Trebi-Ollennu
Technical report CMU-RI-TR-05-38, Robotics Institute, Carnegie Mellon University, August, 2005.

Links

This project's official RI web page.

Updated 1/3/2008