Using a communication network entails an inherent privacy risk: packets cross an infrastructure maintained by a number of parties other than the sender and receiver, each of which has the opportunity to observe them as they are processed and forwarded. This poses a risk because packets carry information that users might rather keep private, namely: (1) the packet's source address, which exposes the sender, (2) the destination address, which exposes the recipient, and (3) the body, which can expose various pieces of user data. Beyond the information explicilty carried by the packet, observers can also learn things merely from the fact that a packet happened to be in a certain place at a certain time. All of this information is often divided into two categories: data (the actual message being communicated, like the contents of an email) and metadata (information about the communication, like "A emailed B at 12:07 today").
Fortunately, we have tools, widely used in practice, to protect this information. Unfortunately, they tend to make aggressive tradeoffs, sacrificing other desirable properties for the sake of privacy. For example, to protect data, the use of encryption is widespread–on the Web, for instance, many sites have switched from HTTP to HTTPS. Unfortunately, encryption blinds middleboxes-–devices in the network that process traffic beyond basic packet forwarding---which can lead to a loss of functionality, performance, and security. And to protect metadata, anonymous communication systems like Tor reduce accountability by preventing network operators from learning who sent a packet and often introduce performance overheads.
These "privacy vs. X" tussles seem fundamental, because providing privacy requires hiding information like source addresses and payloads, while the other properties–performance, accountability, functionality, and security–require exposing that information. How can you do both? In this thesis, we argue first that a practical balance is possible if you carefully control access to packet data and metadata and second that this requires architectural support from the network. We present novel network architectures and protocols for managing some of the tradeoffs described above.
Peter Steenkiste (Chair)
Dave Oran (Network Systems Research & Design, MIT Media Lab )
Adrian Perrig (ETH Zürich)
Today's Internet has become an eyeball economy dominated by applications such as video streaming and VoIP. With most applications relying on user engagement to generate revenues, maintaining high user-perceived Quality of Experience (QoE) has become crucial to ensure high user engagement. For instance, one short buffering interruption leads to 39% less time spent watching videos and causes significant revenue losses for ad-based video sites. Despite increasing expectations for high QoE, existing approaches have limitations to achieve the QoE needed by today’s applications. They either require costly re-architecting of the network core, or use suboptimal endpoint-based protocols to react to the dynamic Internet performance based on limited knowledge of the network.
In this thesis, I present a new approach, which is inspired by the recent success of data-driven approaches in many fields of computing. I will demonstrate that data-driven techniques can improve Internet QoE by utilizing a centralized real-time view of performance across millions of endpoints (clients). I will focus on two fundamental challenges unique to this data-driven approach: the need for expressive models to capture complex factors affecting QoE, and the need for scalable platforms to make real-time decisions with fresh data from geo-distributed clients. Our solutions address these challenges in practice by integrating several domain-specific insights in networked applications with machine learning algorithms and systems, and achieve better QoE than using many standard machine learning solutions. I will present end-to-end systems that yield substantial QoE improvement and higher user engagement for video streaming and VoIP. Two of my projects, CFA and VIA, have been used in industry by Conviva and Skype, companies that specialize in QoE optimization for video streaming and VoIP, respectively.
Vyas Sekar (Co-Chair)
Hui Zhang (Co-Chair)
Ion Stoica (University of California, Berkeley)
Neural networks are functions used to convert raw data into useful information. They are represented by models with many parameters. Those parameters are generally optimized via gradient-based search. However, gradient-based search is indeed limited to tuning parameters of a given model. Choosing the model itself is an open problem. Models are generally chosen by expert judgement, for computational convenience or by brute force search.
I present "nonparametric neural networks", a framework for jointly optimizing both parameters and model. Under this framework, we alternate between local changes to the model and gradient-based parameter tuning. The search is enabled by defining a connected, continuous space over model-parameter pairs, by penalizing models according to their complexity, and by several optimization tricks.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.
The Introductory Course (IC) is an introduction to the Computer Science Department organized for incoming Ph.D. students. Held during the beginning of the Fall Semester, the IC includes research area overviews, a series of short research presentations by CSD faculty, introductions to computing facilities and other department resources, and several parties and activities that allow students, staff and faculty to get to know each other socially.
A cut problem is specified by a graph G and property X. The goal is to remove edges/vertices (E′/V′) such that the remaining graph satisfies property X. We investigate a labeling view of some of the cut problems based on the structural properties of G−E′/V′ and improve existing results. In particular, we study multiway cut, multicut and
subset feedback set problems. Some of the results include simpler approximation algorithm for multiway cut, unique games hardness of k−ϵ for separating k pairs in directed graph, and a new LP-based constant factor approximation algorithm for subset feedback vertex set.
DNA read mapping is an important problem in Bioinformatics. With the introduction of next-generation sequencing (NGS) technologies, we are facing an exponential increase in the amount of genomic sequence data. The success of many medical and genetic applications critically depends on computational methods to process the enormous amount of sequence data quickly and accurately. However, due to the repetitive nature of human genome and limitations of the sequencing technology, current read mapping methods still fall short from achieving both high performance and high sensitivity.
In this proposal, I break down the DNA read mapping problem into four subproblems: intelligent seed extraction, efficient filtration of incorrect seed locations, high performance extension and accurate and efficient read cloud mapping. I provide novel computational techniques for each subproblem, including: 1) a novel seed selection algorithm that optimally divides a read into low frequency seeds; 2) a novel SIMD-friendly bit-parallel filtering problem that quickly estimates if two strings are highly similar; 3) a generalization of a state-of-the-art approximate string matching algorithm that measures genetic similarities with more realistic metrics and 4) a novel mapping strategy that utilizes characteristics of a new sequencing technologies, read cloud sequencing, to map NGS reads with higher accuracy and higher efficiency.
Carl Kingsford (Chair)
Iman Hajirasouliha (Weill Cornell Medical College)
Bill Bolosky (Microsoft Research)
Copy of Thesis Summary
Use of hands is the primary way we interact with the world around us. Recent trends in virtual reality (VR) also reflect the importance of interaction with hands. Mainstream virtual reality headsets such as Oculus Rift, HTC Vive, and the Playstation VR all support and encourage the use of their hand tracking controllers.
However, tracking hands is very challenging due to their small size and various occlusions. For example, significant portions of the hand can get occluded when people are holding hands together or holding an object. For this reason, the aforementioned companies let their users hold controllers that are more reliably tracked than tracking hands directly.
Another problem of hand tracking is that it often adds latency to the system. Furthermore, networked multiplayer interactions are even more challenging to deliver without users noticing delays due to the addition of network delays. We can work around this problem by trying to predict future hand motions using gaze.
Eye tracking will be ubiquitous in the next generation of virtual reality (VR) headsets, because of its numerous benefits to VR such as foveated rendering, object selection using gaze, and virtual social interactions. However, using the well-known importance of hand-eye coordination during object manipulations to predict hand motion is a relatively unexplored topic.
In this proposal, we propose ways to overcome the current limitations of hand use in VR by addressing the aforementioned challenges. To address difficulty of hand tracking, we present a way to estimate the entire hand pose given a few reliably tracked points on the hand. To address the latency in hand tracking and multiplayer interactions, we propose a method to predict hand pose using gaze which will be commonly available in the next generation of VR headsets. We will first motivate our approach by presenting our observations of hand-eye coordination. Then, we present a study on predicting grasp types to show that gaze is effective in predicting hand interactions. Finally, we end the proposal with plans for completing the hand pose prediction framework.
Nancy Pollard (Chair)
Carol O'Sullivan (Trinity College Dublin)
Copy of Thesis Summary