
SSS Abstracts Fall 2006 

Secure Hierarchical Innetwork Aggregation for Sensor Networks
Friday, October 13^{th}, 2006 from 121 pm in NSH 1507.
Innetwork aggregation is an essential primitive for performing queries on sensor network data. However, most aggregation algorithms assume that all intermediate nodes are trusted. This is at odds with the standard threat model in sensor network security, which assumes that an attacker may control a fraction of the nodes, which may misbehave in an arbitrary (Byzantine) manner.
We present the first viable algorithm for provably secure hierarchical innetwork data aggregation. Our algorithm is guaranteed to detect any manipulation of the aggregate by the adversary beyond what is achievable through direct injection of data values at compromised nodes. In other words, the adversary can never gain any advantage from misrepresenting intermediate aggregation computations. Our algorithm incurs only $O(log^2 n)$ congestion, supports hierarchical aggregator topologies and retains its resistance against aggregation manipulation in the presence of arbitrary numbers of malicious nodes. The main algorithm is based on performing the sum aggregation securely by first forcing the adversary to commit to its choice of intermediate aggregation results, and then having the sensor nodes independently verify that their contributions to the aggregate are correctly incorporated. We build on this primitive to produce secure median, count and average aggregates.
Abstract: Despite the existence of many detailed type safety proofs for small core languages, the fullscale programming languages used for everyday software development have resisted such rigorous analysis. An initial hurdle is the fact that fullscale languages are often defined in terms of compiler implementations or large text documents that are known to contain ambiguities and even errors. Resolving definitional issues is just the beginning: rigorously verifying properties about fullscale languages is a daunting task requiring detailed proofs involving numerous cases. Computer proof assistants are valuable tools in studying formal systems, particularly when human verification would be too errorprone or time consuming to be practical. This talk will discuss the application of proof assistants to the formalization and verification of programming languages and their metatheory. Computer assistance and a principled approach to language definition make it tractable to formalize and prove properties about large, featurerich programming languages. The discussion will be motivated by the speaker's work towards a machinecheckable specification and type safety proof for Standard ML using the Twelf proof assistant. This talk is targetted toward a general computer science audience with no prior familiarity with type systems or proof assistants.
Additional Info: Joint work with Karl Crary and Robert Harper. In partial fulfillment of the Speaking Skills Requirement.
Shape analyses, which statically track the shape of data structures in the heap, are often imprecise in their numerical reasoning, whereas numerical static analyses are often largely unaware of the shape of a program's heap. This talk describes a method for combining a fixed shape analysis with any numerical static analysis. The combination is based on the use of counterexamples as a means of communication between the two analyses and the use of "shadow commands" to eliminate references to the heap when sending queries to the numerical analysis.
Successfully using application programming interfaces (APIs) presents several barriers for novice programmers and professional software engineers alike. To help API designers create APIs that programmers can more easily use, researchers at Microsoft perform usability studies on prerelease versions of APIs. However, these studies are expensive and APIs are too large and numerous to test fully. My research explores developing general design recommendations to guide API designers by conducting usability studies on broadly applicable API design choices, such as which constructors to provide for a given object. This talk describes my approach and the current results of my ongoing research.
In Partial Fulfillment of the Speaking Requirement
The vast majority of fMRI datasets is analysed using the general linear model approach. This limits the kinds of conclusions that can be drawn and the scope of hypotheses about brain activation and fMRI signal that can be tested. From work in the last few years we know that machine learning classifiers can predict, from fMRI of a subject''s brain, which of several cognitive states a subject is in. I will show that we can go further and use them to identify the relationships between brain activation and the cognitive state prediction being made, in ways not feasible or practical with traditional methods. In order to do this, I will go over several experiments with classifiers and what was learnt from them, while also discussing the technical issues involved. I will also describe our efforts to address the problems of predicting and identifying informative patterns of brain activation simultaneously via a new algorithm for learning lowdimensional representations of fMRI data.
How do we find patterns in authorkeyword associations, evolving over time? Or in DataCubes, with productbranchcustomer sales information? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks and many more.However, they have only two orders, like author and keyword,in the above example. We propose to envision such higher order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semiinfinite streams. Thus, we introduce the dynamic tensor analysis (DTA) method, and its variants. DTA provides a compact summary for highorder and highdimensional data, and it also reveals the hidden correlations. Algorithmically, we designed DTA very carefully so that it is (a) scalable, (b) space efficient (it does not need to store the past) and (c) fully automatic with no need for user defined parameters. Moreover, we propose STA, a streaming tensor analysis method, which provides a fast, streaming approximation to DTA. We implemented all our methods, and applied them in two real settings, namely, anomaly detection and multiway latent semantic indexing. We used two real, large datasets, one on network flow data (100GB over 1 month) and one from DBLP (200MB over 25 years). Our experiments show that our methods are fast, accurate and that they find interesting patterns and outliers on the real datasets.
Web contact: sss+www@cs