go back to the main page

 SSS Abstracts 
Spring 2010

go back to the main page

Brain Activity Prediction: Cross-Subject Cross-Study Analysis to Improve Prediction

Tuesday, February 16th, 2010 from 12-1 pm in GHC 4303.

Presented by Indrayana Rustandi, CSD

There have been advances in methods that can predict the cognitive states associated with brain activities, and more recently, methods to predict the brain activities that arise given particular cognitive states. Such methods are usually trained separately for each individual subject within a particular study, which might not be ideal because of the low number of data points available for each subject compared to the dimension of the brain activation data. Methods that can incorporate data from multiple subjects and multiple studies would be able to take advantage of an increase number of data points, but to be effective, these methods need to also be able to account for the anatomical and functional variations that exist across subjects and studies. In this talk, I present such a method based on the idea of finding common dimensions or factors present in the brain activations across subjects and studies. The method uses canonical correlation analysis to discover these common dimensions. I also present results applying this method to the problem of predicting brain activations associated with concrete objects. The method achieves significant improvements in predictive accuracies compared to the accuracies of the state-of-the-art subject-specific method.

Joint work with Marcel Just and Tom Mitchell.

(Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.)


Intuitionistic Theorem Proving with the Polarized Inverse Method

Tuesday, March 16th, 2010 from 12-1 pm in GHC 7501.

Presented by Sean McLaughlin, CSD

Intuitionistic logic is the foundation for a form of reasoning based on proofs known as constructive mathematics. In addition to its traditional uses (e.g. as a foundation for type systems in programming languages) it is used increasingly in domains such as authorization and security. Recent examples include DKAL, an intuitionistic logic used by Microsoft to model authorization and security protocols, and PCAL, used for the statement and enforcement of access control policies.

In this talk I will describe a method of theorem proving in intuitionistic logic called the polarized inverse method. Like resolution for classical logic, the inverse method is a forward search procedure that is applicable to logics satisfying the subformula property and cut elimination. Polarization is an optimization of the inverse method that greatly reduces the search space by taking advantage of recent advances in proof theory such as focusing and the assignment of polarities to subformulas.

We give evidence for the efficacy of the polarized inverse method in experiments on standard benchmarks which show that our implementation is demonstrably better than any existing theorem prover for intuitionistic logic. Finally, we will show that the polarized inverse method generalizes to other logics of interest such as intuitionistic modal logics (e.g. the authorization logics mentioned above) and linear logic.

(Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.)


Brain Activity Prediction: Cross-Subject Cross-Study Analysis to Improve Prediction

Tuesday, March 23rd, 2010 from 12-1 pm in GHC 6501.

Presented by Indrayana Rustandi, CSD

There have been advances in methods that can predict the cognitive states associated with brain activities, and more recently, methods to predict the brain activities that arise given particular cognitive states. Such methods are usually trained separately for each individual subject within a particular study, which might not be ideal because of the low number of data points available for each subject compared to the dimension of the brain activation data. Methods that can incorporate data from multiple subjects and multiple studies would be able to take advantage of an increase number of data points, but to be effective, these methods need to also be able to account for the anatomical and functional variations that exist across subjects and studies. In this talk, I present such a method based on the idea of finding common dimensions or factors present in the brain activations across subjects and studies. The method uses canonical correlation analysis to discover these common dimensions. I also present results applying this method to the problem of predicting brain activations associated with concrete objects. The method achieves significant improvements in predictive accuracies compared to the accuracies of the state-of-the-art subject-specific method.

(Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.)


Efficient Similarity Estimation for Systems Exploiting Data Redundancy

Friday, March 26th, 2010 from 12-1 pm in GHC 4303.

Presented by Kanat Tangwongsan, CSD

Many modern systems exploit data redundancy to improve efficiency. These systems split data into chunks, generate a content-based identifier for each chunk, and find duplicate chunks by comparing their identifiers. For these systems, chunk size becomes a critical parameter for their efficiency: it trades potentially improved redundancy detection (smaller chunks) with increased overhead to represent more chunks.

By examining a large data set, we find that similarity between files (a measure of redundancy) increases unpredictably with smaller chunk sizes, even for files of the same type. Existing systems, however, pick one chunk size that is 'good enough' for many cases, because they lack efficient techniques to determine the benefits at other chunk sizes.

In this talk, I will describe a technique, called Multi-Resolution (MR) Handprinting, that can help address this deficiency. The solution involves generating a compact 'digest' for each file. These digests can then be compared to obtain an accurate estimate of achievable benefits at different chunk sizes simultaneously, providing a building block for optimal chunk size selection.

We show through theoretical bounds and empirical evaluation using a large peer-to-peer and software data set that MR-handprints are compact, requiring roughly 0.15% of the size of the file, and capable of providing accurate similarity estimates (errors less than 5%). MR handprints can provide near optimal chunk size selection for peer-to-peer data transfer, and storage systems, permitting these systems to exploit additional parallelism and compression.

This is joint work with Himabindu Pucha (IBM), David Andersen (CMU), Michael Kaminsky (Intel Labs) and is based on work that appeared in the proceedings of IEEE INFOCOM 2010.

(Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.)


The Open Source Policy Project

Friday, April 2nd, 2010 from 12-1 pm in GHC 4303.

Presented by Ben Towne, ISRI-COS

Every day, policymakers in many contexts are called on to make myriad decisions across a broad spectrum of topic areas, each of which has both near- and far-reaching consequences. In many cases, it is not possible for each policymaker to know all the consequences of his or her decisions, due to finite limits in time and resources, even if the knowledge is present within the legislator's constituency. Individuals often have valuable knowledge they can contribute in one or a small number of subject areas, even if they avoid seeking out policymaking positions due to responsibilities in other areas. Existing work shows that individuals are also willing to share their knowledge and expertise under certain conditions. This is evidenced by projects like Wikipedia, Aardvark, and millions of online discussion boards where experience and advice are exchanged every day.

Ben believes the time is ripe for an Internet-based, 'open source policy' tool which allows people to discuss the pros and cons of policy options. As Wikipedia is a collection of articles about what is, this site would be a collection of discussions/proposals ('Let's do something') about what could be or should be. The main point of the Open Source Policy Project tool is to 'Inform the Debate.' Atomic arguments for and against the proposal, past experience, and a diversity of perspectives can help bring up, and solve, important considerations and conflicting interests to improve the quality of policy decisions. The tool could be used across a wide variety of domains, ranging from politics, to geographically-distributed special interest groups trying to address far-reaching challenges (e.g. sustainability), to engineering teams who must reach decisions as a group (e.g. Health IT, also requiring medical & other professionals, or a lunar lander team). In order for a tool like the one proposed to succeed, it will need to be user-friendly, searchable, scalable, secure, fault-tolerant, intuitive, and properly introduced, among other specifications. This tool could also be used by researchers to study decision networks, examining the links between discussions.

Others have independently attempted to build such tools, with varying degrees of success. This V-Unit project will survey those tools to learn about their approaches, strengths, weaknesses, and experiences in particular domains where they were tested. Two of particular interest include the Peer-to-Patent system, which is detailed in a textbook on Wiki government and the Deliberatorium, which is headed up by proejct advisor Mark Klein from MIT.


Discovering Pathways by Orienting Edges in Protein Interaction Networks

Tuesday, April 6th, 2010 from 12-1 pm in GHC 7501.

Presented by Anthony Gitter, CSD

Modern experimental technologies enable the identification of the upstream receptors that initiate a cellular response as well as the downstream genes and proteins that ultimately react to it. However, one substantial challenge when reconstructing the signaling networks and pathways that connect these sources and targets is that the pathways are directed whereas the protein-protein interactions that compose them are undirected.

In this talk, we formulate signaling pathway prediction as a combinatorial optimization problem in which the goal is to orient an undirected graph such that the known sources and targets are connected by short, high-confidence, directed pathways. We present several approximation algorithms based on search techniques and reductions to satisfiability problems, discuss their theoretical guarantees, and demonstrate their ability to both perform well on simulated data and recover biologically meaningful edge orientations.

Joint work with Ziv Bar-Joseph, Anupam Gupta, and Judith Klein-Seetharaman.

(Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.)


A Fast Solver for Quantified Boolean Formulas (QBF)

Tuesday, April 20th, 2010 from 12-1 pm in GHC 7501.

Presented by Will Klieber, CSD

The problem of Quantified Boolean Formulas (QBF) is a natural generalization of the problem of Boolean Satisfiability (SAT). SAT asks whether a propositional formula is satisfiable; in QBF, each variable is explicitly bound by either a universal or existential quantifier. Many problems in Model Checking and Artificial Intelligence can be naturally expressed in the language of QBF, so there a great deal of interest in developing QBF solvers that are fast for these types of QBF instances.

QBF solvers have inherited many of the techniques used by SAT solvers, including the use of conjunctive normal form (CNF). Although CNF works well for SAT solvers, it hinders the work of QBF solvers by impeding the ability to detect and learn from satisfying assignments. In fact, a family of problems that are trivially satisfiable in negation-normal form (NNF) were experimentally found to require exponential time for existing CNF-based solvers.

In this talk, I will present a method for solving QBF instances in non-CNF, non-prenex form. Our method makes two improvements over existing solvers. First, we reformulate clause/cube learning, extending it to non-prenex instances. Second, we introduce a propagation technique using *ghost literals* that exploits the structure of a non-CNF instance in a manner that is symmetric between the universal and existential variables.

This presentation is based on joint work with Samir Sapra, Sean Gao, and Edmund Clarke.

(Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.)


FAWN: A Fast Array of Wimpy Nodes

Friday, April 23rd, 2010 from 12-1 pm in GHC 4303.

Presented by Vijay Vasudevan, CSD

Datacenters are projected to consume 3% of total US energy consumption by 2011. A significant fraction of this energy goes toward powering high-speed servers used for data-intensive workloads. To help reduce the energy consumed by datacenters, we propose FAWN, an approach to building clusters for low-power data-intensive computing. FAWN couples low-power embedded CPUs to small amounts of local flash storage, and balances computation and I/O capabilities to enable efficient, massively parallel access to data.

In this talk, we present the motivating principles of the FAWN architecture and the design and implementation of FAWN-KV---a consistent, replicated, highly available, and high-performance key-value storage system built on a FAWN prototype. Our design centers around purely log-structured datastores that provide the basis for high performance on flash storage, as well as for replication and consistency obtained using chain replication on a consistent hashing ring. Our evaluation demonstrates that FAWN clusters can provide two to three orders of magnitude more 'queries per Joule' than a disk-based system, and more than twice the efficiency of modern Flash-based high-speed servers.

This is joint work with David Andersen, Jason Franklin, Michael Kaminsky, Amar Phanishayee and Lawrence Tan. FAWN was published in the proceedings of ACM SOSP 2009.

(Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.)


A Behavioral Theory of the Merger: Dynamics of the post-merger integration process

Tuesday, April 27th, 2010 from 12-1 pm in GHC 7501.

Presented by Terrill L. Frantz, ISRI-COS

I report on an examination into the effects of prior characteristics of two merging organizations on the levels of knowledge and culture transfer that occur post-merger. The study consists of five computer-simulation experiments that alter detailed characteristics of the organizational structure, aspects of the task-composition, and dimensions of culture. These experiments begin to formulate, for the first time, a computational model of post-merger integration that can be further built upon to develop a generalizable and full-featured tool for practice and additional research.


Connecting the Dots Between News Articles

Tuesday, May 4th, 2010 from 12-1 pm in GHC 7501.

Presented by Dafna Shahaf, CSD

The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today's society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amount of content published every day. With this much data, it is often easy to miss the big picture.

In this talk, we investigate methods for automatically connecting the dots -- providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the ongoing health-care debate.

We formalize the characteristics of a good chain and provide an efficient algorithm (with theoretical guarantees) to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data and demonstrate its effectiveness in helping users understanding the news.

(Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.)


Web contact: sss+www@cs