CSD

All SCS Ph.D. students should have the chance to create the Next Big Thing, to engage openly in intellectual inquiry without worrying about losing their funding. We want them to freely explore. We want them to change the world.

SCS created the CSD50 Graduate Fellowship Endowment Fund to guarantee that each Ph.D. student has the support to fuel their research and exploration while at CMU. Contributions to the fund will help us attract, retain and support the best computer scientists in the world as they create the future.

When you give to the CSD50 Graduate Fellowship Endowment Fund, you not only support the future of computer science, but you can recognize the contributions of individuals with deep SCS roots. We've specifically highlighted four individuals you can honor with your gift: Sharon Burks, A. Nico Habermann, Raj Reddy and Joe Traub. You can make your gift by clicking below and specifying one of these SCS legends in the "Comments" box on the next page.

What does a person’s brain activity look like when they read the word apple? How does it differ from the activity of the same (or even a different person) when reading about an airplane? How can we identify parts of the human brain that are active for different semantic concepts? On a seemingly unrelated setting, how can we model and mine the knowledge on web (e.g., subject-verb-object triplets), in order to find hidden emerging patterns? Our proposed answer to both problems (and many more) is through bridging signal processing and large-scale multi-aspect data mining.

Specifically, language in the brain, along with many other real-word processes and phenomena, have different aspects, such as the various semantic stimuli of the brain activity (apple or airplane), the particular person whose activity we analyze, and the measurement technique. In the above example, the brain regions with high activation for “apple” will likely differ from the ones for “airplane”. Nevertheless, each aspect of the activity is a signal of the same underlying physical phenomenon: language understanding in the human brain. Taking into account all aspects of brain activity results in more accurate models that can drive scientific discovery (e.g, identifying semantically coherent brain regions).

In addition to the above Neurosemantics application, multi-aspect data appear in numerous applications such as mining knowledge on the web, where different aspects in the data include entities in a knowledge base and the links between them or search engine results for those entities, and multi-aspect graph mining, with the example of multi-view social networks, where we observe social interactions of people under different means of communication, and we use all views/aspects of the communication to extract communities more accurately.

The main thesis of our work is that many real-world problems, such as the aforementioned, benefit from jointly modeling and analyzing the multi-aspect data associated with the underlying phenomenon we seek to uncover.

In this thesis we develop scalable and interpretable algorithms for mining big multi-aspect data, with emphasis on tensor decomposition. We present algorithmic advances on scaling up and parallelizing tensor decomposition and assessing the quality of the results multi-aspect data applications, that have enabled the analysis of multi-aspect data using tensors that the state-of-the-art was unable to operate on. Furthermore, we present results on multi-aspect data applications focusing on Neurosemantics and Social Networks and the Web, demonstrating the effectiveness of modeling and mining multiple aspects of the data. We conclude with our future vision on bridging Signal Processing and Data Science for real-world applications and with concrete future directions on multi-aspect data mining algorithms and applications.

Thesis Committee:
Christos Faloutsos (Chair)
Tom Mitchell
Jeff Schneider
Nikos Sidiropoulos (University of Minnesota)

Many natural forms of communication involve two (or more) players interacting based on a shared context, where the context is huge and imperfectly shared. This large context typically helps compress communication even when the sharing is imperfect. How can we explain this phenomenon mathematically?

In this talk, we will argue that communication complexity gives the right lens to view this class of problems, and this class of problems gives rise to new questions about communication complexity. I will describe some of these questions, some partial answers and some confounding open questions.

Based on joint works with Ilan Komargodski, Pravesh Kothari and Madhu Sudan.

In this talk we present a faster distributed broadcasting primitive for the classical radio network model.
 
The most basic distributed radio network broadcasting primitive - called Decay - dates back to a PODC'87 result of Bar-Yehuda, Goldreich, and Itai. In any radio network with some informed source nodes, running Decay for $O(d \log n + \log^2 n)$ rounds informs all nodes at most $d$ hops away from a source with high probability. Since 1987 this primitive has been the most important building block for implementing many other functionalities in radio networks. The only improvements to this decades-old algorithm are slight variations due to [Czumaj, Rytter; FOCS'03] and [Kowalski and Pelc, PODC'03] which achieve the same functionality in $O(d \log \frac{n}{d} + \log^2 n)$ rounds. To obtain a speedup from this, $d$ and thus also the network diameter need to be near linear, i.e., larger than $n^{1-\eps}$.

Our new distributed primitive spreads messages for $d$ hops in $O(d \frac{\log n \log \log n}{\log d} + \log^{O(1)} n)$ rounds. This improves over Decay for any super-polylogarithmic $d = \log^{\omega(1)} n$ and achieves a near-optimal $O(d \log \log n)$ running time for $d = n^{\eps}$. This also makes progress on an open question of Peleg.

Based on joint work with Bernhard Haeupler.

David Wajc is a second-year PhD student in the computer science department at CMU, advised by Bernhard Haeupler. He is broadly interested in algorithmic aspects of theoretical computer science, and has worked on online algorithms, distributed algorithms, and AGT, among others.

Detection and characterization of potentially harmful radiation sources is one of common problems encountered in the context of homeland security as well as in safeguarding of radioactive isotopes used in industrial or medical practices. Any improper or insecure storage of radioactive material may cause a substantial harm to humans and to the environment.  The task of detecting relatively weak, potentially shielded sources, is particularly difficult when facing variability of patterns of background radiation. This is a common circumstance in cluttered urban environments.  It is also in these environments where any improperly stored radioactive material may inflict the most extensive harm.

In this thesis, we explore two algorithms that are useful in different situations. The first, the G-P detection method, works well with large sensors that can fit in vehicles. Its novelty is in making Poisson assumptions about the photon counts observed in gamma-ray spectra, while making Gaussian assumptions about their rates. The second, List Mode Regression, is tailored to small portable sensors which produce spectra with low photon counts. Both methods outperform current state of the art algorithms in their respective usage scenarios.

Thesis Committee:
Artur Dubrawsli, Srinivasa Narasimhan

Copy of Thesis Document

In this talk we present a faster distributed broadcasting primitive for the classical radio network model.

The most basic distributed radio network broadcasting primitive - called Decay - dates back to a PODC'87 result of Bar-Yehuda, Goldreich, and Itai. In any radio network with some informed source nodes, running Decay for O(d \log n + \log^2 n) rounds informs all nodes at most d hops away from a source with high probability. Since 1987 this primitive has been the most important building block for implementing many other functionalities in radio networks. The only improvements to this decades-old algorithm are slight variations due to [Czumaj, Rytter; FOCS'03] and [Kowalski and Pelc, PODC'03] which achieve the same functionality in O(d \log \frac{n}{d} + \log^2 n) rounds. To obtain a speedup from this, d and thus also the network diameter need to be near linear, i.e., larger than n^{1-\eps}.

Our new distributed primitive spreads messages for d hops in O(d \frac{\log n \log \log n}{\log d} + \log^{O(1)} n) rounds. This improves over Decay for any super-polylogarithmic d = \log^{\omega(1)} n and achieves a near-optimal O(d \log \log n) running time for d = n^{\eps}. This also makes progress on an open question of Peleg.

Based on joint work with Bernhard Haeupler.

David Wajc is a second-year PhD student in the computer science department at CMU, advised by Bernhard Haeupler. He is broadly interested in algorithmic aspects of theoretical computer science, and has worked on online algorithms, distributed algorithms, and AGT, among others.

Your cheering and participation will encourage the students to present their Projects in the best possible way.

In the CPS V&V Grand Prix (CPS Verification & Validation Grand Prix), students in the CMU course Foundations of Cyber-Physical Systems will have the opportunity to present their final projects to a panel of experts in CPS who will give them feedback from an industry perspective and insights into the state-of-the-art verification & validation methods used for CPS.  In addition to the opportunity to show off their results to these industry representatives, a few top projects and presentations will be awarded prizes (not to mention the fame and glory that inevitably comes with being a verification rockstar).

Replication is a fundamental technique to guarantee service availability and consistency. However, when a system component may misbehave, for instance due to a software intrusion, coordinating its replicas requires the adoption of complex distributed fault-tolerant protocols. In fact, such protocols require several executing replicas and many communication steps to work correctly. This results in redundant service executions that are inefficient and typically force the service to be deterministic so to achieve consistency (i.e., if correct replicas process the same requests in order, they produce the same output).

In this seminar we introduce the concept of hardware-based computation verification in the area of fault-tolerant replication. It is shown how the technique allows to deal with malicious replicas while linearly (in the number of replicas) reducing the computational effort to execute a fault-tolerant distributed service. In particular, we design a fully passive replicated system that dramatically reduces the fault-tolerance complexity, in terms of protocols, number of messages, processing effort and allows non-deterministic operations. Also, we implement a system prototype based on commodity hardware and on recent research work in Trusted Computing (TC). Finally, we show experimentally that our system delivers good performance, compared to optimized state-of-the-art systems, despite its main overhead is due to TC technology.

Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.

Pages

Subscribe to CSD