Sensor Placement and Information Acquisition (SEPIA)

My research is in observation selection, utilizing powerful tools from decision theory, information theory and sensor networks to create smarter reasoning systems and enable monitoring of spatio-temporal phenomena. In the SEPIA project (Sensor Placement and Information Acquisition), together with my collaborators, I've been developing efficient algorithms for choosing important observations which have strong provable quality guarantees and scale to large problems. We applied our algorithms to several important practical problems such as predicting traffic, building automation, activity recognition,  environmental monitoring and deciding which blogs to read on the Internet.

My collaborators are Carlos Guestrin, Anupam Gupta, Jon Kleinberg, Jure Leskovec, Bilge Mutlu, Ajit Singh, Amarjeet Singh, Vipul Singhvi, William Kaiser, Jeanne VanBriesen, Christos Faloutsos, Brendan McMahan, Natalie Glance.


Near-optimal observation selection using submodularity

Submodularity. We identify a key structural property of many observation selection problems: submodularity, an intuitive diminishing returns property. Exploiting this property allows to obtain efficient approximation algorithms with provable guarantees, as well as drastically speeding up algorithms. Here's a short overview paper on this topic.

In [UAI'05], we show that, for a large class of graphical models, the problem of selecting a set of informative attributes (features) A maximizing the information gain I(U;A) with respect to target variables U, can be approximated efficiently for a large class of graphical models. Using the concept of submodularity, we show that the simple greedy algorithm provides a constant factor (1-1/e) approximation. Moreover, we show that this guarantee is essentially the best we can hope for. We apply our algorithms to select sensors for best predicting the Bay Area traffic. Here, A would be the selected sensors, and U the minimum traffic flow in certain regions.

Spatial monitoring. For monitoring spatial phenomena, such as temperature in a building, which can frequently be modeled using Gaussian Processes, mutual information, I(A;V\A) is an effective criterion for quantifying the informativeness of a sensor placement, avoiding the wasted information effect of the commonly used maximum entropy sampling. In [JMLR'07] we use submodularity to show that the greedy algorithm generates near-optimal placements under this criterion. We also show how we can exploit structure in the kernel to make this algorithm very efficient.

Outbreak detection. In [KDD'07], we study two seemingly very different problems: placing sensors in a municipal water distribution network in order to detect possible contaminations, and selecting informative weblogs to read on the Internet. We show that both problems share essential structure: Information cascades spreading over the blogosphere can be modeled similarly as contaminations spreading through water networks. We show that many objectives, such as optimizing the detection time or likelihood, satisfy submodularity, making the problem amenable for greedy (and hence highly scalable) optimization. We participated in the Battle of the Water Sensor Networks competition, which required to optimize for sensor placements in a real metropolitan-area network with 20,000 nodes. Our approach achieved the highest number of non-dominated solutions. In the blog selection problem, our approach outperforms existing ranking criteria. Here's a link to the CASCADES project page.

Posture recognition. In [UIST'07], we apply our sensor placement methods to instrument a chair with pressure sensors, in order to enable seating posture recognition. Our approach reduces the sensor hardware cost by a factor of 30 compared to previous solutions, while achieving comparable classification accuracy. We test our approach in a user study on a real deployment.


Sensor selection for monitoring traffic
Sensor selection for monitoring traffic in the Bay Area

Information cascades in blogsSelecting which blogs to read in order to catch large information cascades early.

Blog performance metrics
Comparison with existing criteria for blog ranking and selection.

Left: Chair with expensive high-resolution sensor. Right: Chair with small number of near-optimally placed sensors
Sensor placement for seating activity recognition
Demo of pSPIEL at IPSN '06
Demo of pSPIEL at IPSN
pSPIEL achieves near-optimal benefit cost ratio
pSPIEL provides near-optimal information--cost tradeoff
Path planning on lake Fulmor

Informative paths planned for monitoring Lake Fulmor, CA

 

Nonmyopic algorithms


Communication constraints. When placing wireless sensor networks, the sensors should not only be informative, but also communicate well. In this more complex setting, the greedy algorithm performs arbitrarily badly. In [IPSN'06] we present an efficient randomized algorithm, pSPIEL, which near-optimally trades off information and communication cost. We implement our approach on a real deployment of 46 TMote Sky motes, and optimize a deployment of light sensors in the CMU Architecture department. We also demonstrated our algorithm live at NIPS 2005 and IPSN 2006.

Path planning. In [IJCAI'07], we consider the problem, where we want to plan informative paths in order to monitor the algal growth in Lake Fulmor, CA, using robotic boats. In this case, the greedy algorithm performs badly as well. We extend an approximation algorithm by Chekuri and Pal for submodular orienteering to the setting of optimizing paths for multiple robots; we also make the (originally super-polynomial) algorithm practical by devising a spatial partitioning as well as branch & bound schemes. In [AAAI'07] we show how this approach can be extended to the spatio-temporal planning setting.

Robustness. Many observation selection problems require robustness, e.g., against uncertainty in the model parameters. In [NIPS'07] we show, how many such problems can be modeled as selecting a set of observations A which maximizes the minimum over a collection of submodular functions. Examples include robust (nonlinear) experimental design problems, minimizing predictive variance in Gaussian Processes, and adversarial outbreak detection. We develop Saturate, a nonmyopic algorithm which is best possible for this problem (matching lower bounds) under reasonable complexity assumptions.


Active learning


Gaussian processes. The above problems address a priori (open-loop) selection problems, where observations are chosen according to a model, before any measured values are obtained. In some applications, it is feasible to implement a sequential (closed-loop) strategy, which decides on the next observation based on previously obtained values. An important question is, how much better a sequential algorithm can perform compared to the best a priori selection. In [ICML'07], we partially answer this question for the case of sequentially optimizing mutual information in Gaussian Processes. We develop a theoretical bound quantifying the sequential advantage, and use it to guide an exploration-exploitation approach. Our approach has sample complexity guarantees, and performs well on environmental monitoring problems.

Chain models. In [IJCAI'05] we prove that for chain graphical models (such as HMMs), one can efficiently compute both the best subset of observations, and even the (exponentially large) best sequential plan, maximizing arbitrary value of information objectives. We also show that even for tree graphical models (for which efficient inference is tractable), this problem is wildly intractable (NP^PP complete).

In [SenSys'05], we consider a building automation problem, where we want to select  sensors in order to intelligently manage the light controls in a building, satisfying user preferences and at the same time conserve power. We formalize and solve this problem using our approach for maximizing value of information. 

NIMS
Active learning for environmental monitoring (image courtesy of CENS)
Improving precision and recall through expert guided structured classification
Optimal selection of a few labels drastically improves precision and recall in information extraction
Prototype light management system using Active Sensing
Our testbed for intelligent light control using Active Sensing