Concurrency errors are notoriously difficult to debug. In 15-410, CMU's undergraduate operating systems class, long-running stress tests are the go-to tool for both students and TAs to find subtle race conditions. However, stress tests are unreliable, depending on randomly-occurring timer interrupts to expose bugs. Systematic testing provides a more exhaustive alternative to stress testing. A systematic test, parameterized by a set of preemption points during program execution, aims to force the system under test to execute all possible thread interleavings around those points. Systematic testing's major downside is state space explosion: for a set of preemption points of size n, a system with k runnable threads at each preemption point will have O(n^k) thread interleavings. To mitigate this problem, prior work has explored state space reduction techniques such as dynamic partial order reduction (DPOR), iterative context bounding (ICB), and stable multithreading (SMT).
Despite the success of each of these approaches, each of them is still vulnerable to state space explosion resulting from a too-dense set of preemption points. We are developing an "iterative deepening" technique for automating exploration of the space of possible preemption-point sets itself, built on top of the existing techniques of state space estimation and data race detection. This technique will help a systematic testing framework find and explore the "most meaningful" state spaces given a fixed budget of CPU time, requiring little or no human intervention in configuring preemption points. We plan to test this technique by implementing it in Landslide, a tool for testing student projects in 15-410 (undergraduate operating systems), and releasing it to the class as a test framework in the upcoming semester.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.