CURRENT RESEARCH PROJECTS
FINGERPOINTING: FAILURE DIAGNOSIS, ROOT-CAUSE ANALYSIS
Problem diagnosis (or fingerpointing) involves instrumenting systems to yield meaningful data, detecting errors and/or failures within these systems, and ascertaining their root-cause, i.e., the underlying fault. Fingerpointing is difficult because the distributed interactions, protocols and inter-component dependencies in computer systems can cause a problem to change ``shape'' or manifestation, leading to potential red herrings in problem determination.
We are currently developing a variety of techniques for automated fingerpointing in a number of distributed systems, such as Hadoop, PVFS and Lustre -- the aim is to perform online and offline root-cause analyses in order to identify a faulty node/process, diagnose the source of the problem, and report it to the user or administrator through some form of visualization. We
have developed techniques for log analysis, black-box performance analysis and hardware performance-counter analysis, in
the interests of localizing the origin of the problem in large distributed systems.
More information: Fingerpointing research group website
Follow us on twitter: @CloudAtCMU
ZERO-DOWNTIME LIVE UPGRADES IN DISTRIBUTED SYSTEMS
Implementing online software upgrades (enabling changes in the behavior, configuration, code, data or topology of an executing distributed application) is challenging in distributed systems. This functionality is essential for enabling the self-regulating, autonomic management and maintenance of enterprise computer systems.
We are addressing the challenges of maintaining the existing (and potentially unknown) dependencies between distributed components and services, handling API evolution, performing upgrades across mutually-distrustful administrative domains, transferring state (which may require long running data migration and conversion tasks executing in parallel with regular requests for the same data), assessing and minimizing the impact of upgrades on the running services while improving the value of the infrastructure according to some well-defined metrics and tolerating faults during the upgrading process.
Follow us on twitter: @CloudAtCMU
YINZCAM: LARGE-SCALE LIVE MOBILE STREAMING
YinzCam allows fans, right from their seats during the game, to create/view their own instant replays (through short- and long-rewinds of live video), catch the action from 4-8 different live camera angles (e.g., Follow-Crosby Cam for the Pittsburgh Penguins), get automated instant replays seconds after a play has happened (from all the different camera angles), and get real-time statistics, game rules, players' roster, etc., all on their own wifi-enabled smartphones, and all without violating broadcast rights. YinzCam was deployed as a pilot for the Pittsburgh Penguins for 40 home games from October 2008--May 2009 (including the 2009 Stanley Cup playoffs and the Stanley Cup Final) for the Pittsburgh Penguins, a National Hockey League (NHL) team in the United States. YinzCam is platform-agnostic and was supported on the fans' own smartphones (as a browser-based service, without requiring any software installation on the phone), including on the iPhone, the iPod Touch, the Blackberry Bold, the Nokia N95, the Samsung Omnia, the T-Mobile Android G1, the HTC Touch Pro, and 25-odd different wifi-enabled Windows Mobile phones.
More information: YinzCam website
Follow us on twitter: @yinzcam
MOBILE & SENSOR CLOUD COMPUTING
We are developing multiple mobile cloud-computing middleware platforms to enable new large-scale mobile-cluster applications. Smart phones and other wireless mobile devices are increasingly becoming larger in compute power, networking, memory, storage, etc. The aim is to leverage mobile devices as the nodes of a large-scale cloud-computing infrastructure and will provide the middleware for these devices to work together seamlessly, in a peer-to-peer manner, to support a variety of new mobile-cluster applications. Hyrax is a platform that we have developed for large-scale data intensive computing on mobile devices. Hyrax is based on MapReduce, is derived from Hadoop and runs on the Android platform. Agora is a sensor middleware
platform that we have developed to support interoperability and ease-of-use in developing sensor-network applications that encompass a variety of hardware and software architectures.
Follow us on twitter: @CloudAtCMU
Our research group is focused on improving the viewing experience, refereeing,
scouting, and sports performance aspects of (American) football through
engineering and research. Our approach is to use a synergistic combination of
sensors, communication protocols, computer vision,
and machine learning techniques to provide
enhanced tracking and motion analysis
during practice or games. We have currently developed a smart football that can
be used to track the trajectory and landing position of the football in the field
of play. We have also developed embedded coaching aids to help running backs,
quarterbacks, wide receivers and punt kickers train reproducibly and
independently of their coaches. The resulting data can also be used to indicate
the performance of an individual player.
More information: Football Engineering website
Follow us on twitter: @SportsTechAtCMU
TRINETRA: ASSISTIVE TECHNOLOGIES FOR THE BLIND & THE DEAF
Trinetra aims to develop cost-effective, smartphone-enabled assistive technologies to provide visually impaired people with greater independence and an enhanced quality of life in their daily activities. The broad objective is to harness the collective capability of diverse networked embedded devices to support location-aware and context-aware applications, including first-responder support, building navigation, retail shopping, smart transportation, etc. To date, we have researched and developed a portable barcode-based solution involving an Internet- and Bluetooth-enabled smartphone to aid grocery shopping at the Carnegie Mellon campus convenience store, Entropy.
We have also extended this to assist both sighted and visually impaired commuters with their transportation and commute-planning needs, using a smart phone to convey notifications of arrivals, departures, etc. We have also developed a phone-based currency identifier for the visually impaired.
More information: Trinetra website
Follow us on twitter: @AssistTechAtCMU
iBURGH: TECHNOLOGIES FOR E-GOVERNMENT
This is a more recent research effort to connect individuals with their government and its
functioning through technology. The central idea is to put control into people's hands by
allowing them to view their city/state/federal government in action, and to provide them
with more direct and ready means to communicate issues of concern to them. Working closely
with the Pittsburgh City Council, we have developed the first mobile app for
e-government, which allows the residents of Pittsburgh to communicate auto-geotagged visual complaints
(potholes, grafitti, fallen trees, etc.) directly into their City's 311 system for
resolution, directly from their cellphones.
Follow us on twitter: @cityZenMobile
MEAD: Real-Time Fault-Tolerant Middleware
Enhances distributed middleware (CORBA and Java) applications with dependability, including: 1) transparent, yet reconfigurable, fault tolerance at runtime, 2) configuration advice to tailor an application's fault-tolerance to its reliability and resource needs, 3) proactive fault tolerance based on failure prediction, 4) resource-aware system adaptation to failures, and 5) enabling distributed, fault-tolerant applications to live realistically with nondeterminism. The significant contributions of MEAD included its analysis of the three-way trade-offs between resources, timeliness and fault-tolerance. MEAD was also unique in exploiting compile-time program analysis and run-time dynamic analysis to provide consistent and efficient (albeit lazy) replication, even for nondeterministic multithreaded CORBA/Java applications.
Survivable Distributed Systems -- Vajra, Elephant, Thema, Immune
The Elephant work focuses on live updates of intrusion detection systems, such as Snort. The Thema work focuses on Byzantine-fault tolerance for multi-tier distributed applications based on Web Services. Vajra focuses on benchmarking the survivability of various distributed infrastructures (such as Castro-Liskov BFT, Immune, Fleet, etc.) through fault-injection of benign and malicious failures. Immune was a collaborative research effort with Prof. Kim
P. Kihlstrom that led to the development of a survivable
infrastructure for CORBA applications. Immune enables CORBA
applications to continue operating, despite faults that occur within
the system, as well as intrusions or malicious/Byzantine attacks that
damage the underlying system. Majority voting on the traffic between
replicated CORBA objects, value fault detection, and secure multicast
protocols (which employ message digests and digital signatures) are
Immune's building blocks.
Eternal: Transparent Fault-Tolerant Middleware
Eternal is a transparent fault-tolerant infrastructure that supports
reliable CORBA/Java applications, without
requiring any modification to the application, to the OS or the
middleware. Eternal provides support for active and
passive replication, overcomes the non-determinism inherent in
multithreaded CORBA/Java applications, and provides for gateways to support
external clients. The key contributions of this research work are the
support for strong replica consistency, the sanitization of
non-deterministic multithreading, and most importantly, the
transparency of the fault tolerance. This transparency frees CORBA
application programmers from worrying about the difficult issues of
reliability, and allows them to focus on their area of expertise - the
application. This also leads to considerable savings in terms of
development time because, as soon as the application logic is ready,
fault tolerance is available to be deployed "out-of-the-box" at
run-time. Eternal provided
transparent fault tolerance
to different implementations of CORBA: VisiBroker (Borland), Orbix
(Iona), CORBAplus (Expersoft), TAO (Washington University, St. Louis),
e*ORB (Vertel), omniORB2 (AT & T Laboratories, UK), ORBacus
(Object-Oriented Concepts) and ILU (Xerox PARC). The understanding and
insights gained from Eternal significantly influenced the Fault-Tolerant