Carnegie Mellon University Website Home Page

Integrated Debugging

Finding the Hex among a Million Robotic Modules

In programming modular robots to create dynamic, 3-dimensional representations, the Carnegie Mellon-Intel Claytronics Research Project also anticipates a less enchanting probability for software programmers whose coded instructions will shape the actions of metamorphic robots.   While harnessing millions of tiny computers interpreting billions of instructions on parallel threads, claytronics will open the gates to a greater number of bugs that can add a hex to coded instructions, multiplying opportunities for coded devices to go awry.   Moreover, as a unique environment for co-dependent computation, claytronics also can create operations where new types of bugs develop.

Watch Points

More Places for Bugs to Hide  

This scale of interdependent computing in claytronics creates many more processes where bugs can hide. The numbers of actuating robots will make claytronic ensembles rife with occasion where program instructions might produce misalignments among devices or inappropriate actions. Each module will carry its own stream of instructions, often more than one at a time. These threads will move in parallel with similar programs in every other module. From module to module, threads will mingle, and these convergences of coding in a mega-threaded realm will dramatically increase the probability of tangles or other subtle, unforeseen interactions among strings of instruction. Such is the likelihood of glitches common for the course of software development - expanded to the scale that claytronics brings to the realm of parallel and distributed computing.

New Varieties of Bugs  

In the processes that drive metamorphic robots, devices that create structures and transform shapes, the potential for unfamiliar types of errors also exist. New processes invite new bugs that scatter a species of confusing redundancy in the execution of code. So claytronics research has also implemented a program to detect and correct errors that create a state of conflict spread across multiple catoms that appear by individual status to have executed a program correctly.

To address the requirements for debugging to match the scale of computing in an ensemble, claytronics researchers are developing tools and strategies that scale up the bug patrol for massively parallel, widely distributed computation tied to the actuation of robotic devices. DPRSim, the basic simulation program used to model the performance of claytronic ensembles, is also a versatile tool for the integrated debugging of large ensembles of modular robots, one that provides programmers with a rich visual context in which to track bugs.

DPRSimulation empowers strategies for writing program code that incorporates macros to capture complete histories of change to critical groups of variables. Such macros monitor the performance of key variables, such as scalar values and object classes, and preserve their runtime histories in a SQL database that a programmer can rapidly reconstruct when it is necessary to track the path of errors. The visualization of the path of error in the simulator 's replay of events makes the search for programming error much easier.   In DPRSim, a programmer can scan the execution of individual programming threads from both two-dimensional and three-dimensional perspectives in order to trace errors to root causes.

The multi-dimensioning of searches for program errors is a powerful feature of simulation-enhanced debugging. However, that enhancement becomes all the more powerful because the programmer can review the complete histories of code threads associated with each catom in an ensemble.

Complete historical review is a great improvement to the context for debugging typically available to programmers, who more often must draw reconstructions from snapshots captured at intervals of a program 's performance and then attempt to interpolate missing details. Through the replay of simulation, the history provides the programmer with not only the record of events tied to key variables but also a visual representation of events tied to individual catoms.   The performances of individual catoms can also be highlighted by color to mark locations and textual labels to append notes.   They can also be isolated visually within a transparent surrounding that eliminates the clutter of catoms superfluous to an investigation.

With claytronics researchers writing programs for simultaneous processing by 50,000 to 500,000 modules, each catom executing one or more threads of coded programming, this retooling of the debugging process has become a necessity - although one that is relevant beyond the applications of claytronics technology. With forensic simulation, a programmer can much more easily and quickly reconstruct root causes of the coding and algorithmic errors that compromise smooth operations in a mega-threaded environment. The visualization of records permits a dramatic scaling upward of error searches and speeds recovery of essential information.   The process reduces the time of investigations while rewarding focused inquiries with increasing levels of relevant information.

Distributed Watchpoints  

A more challenging error can arise when modules in a group properly execute their individual processes yet do not place the group in an intended global state.   In this outcome, the error may be said to be distributed because it is not clear that any individual module has failed to properly execute its commands. However, the group has failed to achieve a desired configuration. The result is a condition that cannot be observed in the local state of one robot, and traditional debugging strategies do not easily detect this variety of error.

Such a condition might exist, for example, when seven modules require one among them to be designated as a leader of future group motion. Two robots might execute simultaneously on that requirement and create two leaders, a state that would impair the ability of the group to coordinate its actions with surrounding groups. Traditional debugging tools would probably not detect this new species of bug. They would be more likely to follow the path of programming threads as executed by the individual modules. Such an analysis would determine that hardware performed as it should and that each module properly executed its thread of programming.

Thus, to find bugs whose effects are distributed across the status of several catoms, claytronics researchers have developed Distributed Watchpoints, an algorithm-level approach to detecting and fixing conditions that need to be resolved in order for the ensemble to engage properly the related status of multiple catoms in appropriate sequences. Early examples of this class of error have focused on such issues as leader elections among groups of catoms, token passing for permissions to move data through the network, and the smoothing of gradient values spread across catom clusters.

Watch Points establish the nodes that receive surveillance to determine the validity of distributed conditions. This approach provides a simple and highly descriptive set of rules to evaluate distributed conditions and proves effective in the detection of errors missed by more conventional debugging techniques. The approach will continue to be implemented with the development of additional functions and further validation.

Publications and Documents

Integrated Debugging of Large Modular Robot Ensembles,
    Benjamin D. Rister, Jason D. Campbell, Padmanabhan Pillai, and Todd C. Mowry. In Proceedings of the IEEE International Conference on Robotics and Automation ICRA '07, April, 2007.
Distributed Watchpoints: Debugging Very Large Ensembles of Robots,
    Michael De Rosa, Seth Copen Goldstein, Peter Lee, Jason D. Campbell, and Padmanabhan Pillai. In Robotics: Science and Systems Workshop on Self-Reconfigurable Modular Robots, August, 2006. See derosa2007-icra07.