15-300 Fall '18
Information related to 15-400 projects

Project Suggestions

One of the best sources of project suggestions is the list of projects Tom Cortina sends out to all SCS undergraduates at the beginning of the semester. We encourage you to read that email; many of the projects may not be taken still. Another good idea is to contact professors who presented in class about their research area. These include:

Weina Wang
Leila Wehbe
Heather Miller, programming languages for distributed systems
Justine Sherry, networking
Eunsuk Kang, formal modeling
Nihar Shah - statistical learning and game theory
Bob Murphy - computational biology
Vicent Alevin - human-computer interaction
Ryan O'Donnell - theory
Aarti Singh - machine learning
Jaime Carbonell - language technologies
Bryan Parno - security
Illah Nourbakhsh - robotics
Keenan Crane - graphics
Todd Mowry - systems
Jonathan Aldrich - programming languages and software engineering

Here are some additional projects suggested by SCS Faculty:

Detecting Flaky Tests via Machine Learning

In previous work, we have been able to detect flaky tests, where flaky tests are automated unit tests which pass and fail non-deterministically. The project I am proposing would be to take what we have learned about flaky tests and use machine learning to try and identify flaky tests at a much larger scale. You can learn more about our previous work here: http://www.deflaker.org/ Contact me at Michael Hilton (mhilton@cmu.edu).

Debugging Neural Network Decisions

We would like a responsive, interactive tool for explaining and debugging neural network decisions. The tool would leverage our existing explanation framework (described here) and keras code. Additionally, the student would help us explore domain-specific applications of our prior work, where there may be room for developing new interpretation techniques. There would also likely be an opportunity to publish a case study of the application of this work.

The project would primarily entail systems-building and software design for the tool, however, the student would do all of the following:

write a back-end leveraging our explanation code, that does the necessary caching for responsiveness,
design an interactive front-end for making refined queries on a model's decisions,
gain experience in using the standard deep learning python frameworks,
become familiar with the state-of-the art in transparent ML,
help explore domain-specific applications of influence-directed explanations, and
help develop interpretation techniques useful for understanding a deep model's decisions in various domains.

We expect this project would be well-suited for a student with experience in software engineering, and an interest in applications and research in machine learning. Contact Klas Leino (grad student) and Matt Fredrikson (faculty)

Project Suggestions from CS Department PhD students

Michael Coblenz
I'd love to have an undergrad continue the work I started with Glacier, my
class immutability system for Java. I'd like to build a type inference
system, apply it to the JDK, and then do a corpus study to see how
applicable Glacier is. I'm also looking for students to help with the
Obsidian implementation; a request is already in the queue to go out in
January.

Zack Coker

One common way to evaluate automated program repair is to use test cases -
if all the test cases pass, then the fix matched what the developer wanted.
However, these approaches do not have as much information as someone
working on the project would (for example: which solution out of the space
of possible solutions is the best). It would be interesting to see how
much participants without experience in an application project are able to
infer about the correct fix and what the participants use to make that
inference.

Shayan Doroudi

Using data-driven models to accurately predict how humans learn is a
difficult task, because human learning is sophisticated. When we use a
statistical model of student learning that is overly simple, we can avoid
overfitting, but a number of other issues can arise due to the bias of the
overly simple model. In particular, we have recently shown that depending
on how much data we use from each student, we can learn drastically
different parameters for a particular statistical model of student
learning. In this project, we want to try to find statistical models that
are not susceptible to this issue. I am looking for a student that is
interested in machine learning and is also passionate about student
learning and education.

Pratik Fegade

We find that there is a general lack of a robust compiler infrastructure
for the Java language that impedes the rapid development of static and
dynamic analyses. It would be great if someone could work on bettering the
current tools. The current approach I have to understanding the user's
graph computation is a combination of pattern matching on source code to
look for common patterns, and possibly augmenting that with dynamic
analyses. I believe there are ways to have a more principled approach to
this. Maybe a data driven approach to identifying how people write the
common computations we want to look for and refining the pattern matching,
or even making use of auxiliary information like variable/method names,
code comments to identify computations.

Graham Gobieski

I have a variety of smallish project ideas that potentially a qualified
undergraduate student could work on. These include the following:
1. Investigating capsule networks as a way to reduce the overall footprint
(memory and operations) of a inference model 2. Help write some
intermittent code in order to support audio processing library functions.
3. Investigate how to use non-negative matrix factorization to reduce the
size of fully-connected layers in neural networks. 4. Investigate how
capacitor size effects the fraction of time spent running for an
intermittent device. Do this analysis across a range of capacitor sizes and
capacitor technologies. Projects three and four are probably the most
suitable for an undergraduate, but project one and two might be possible
for a motivated individual.

Isaac Grosof

I'm interested in working with someone on analyzing policies which schedule
jobs with known sizes on multiple processors. Specifically, I'm interested
in the competitive ratio of the Shortest Remaining Processing Time policy
on many processors. I want to transfer the existing worst-case analysis in
this area to the stochastic domain, and improve upon it. Interest in theory
is all that's needed, no prior exposure is necessary.

Rajesh Jayaram

Data stream algorithms have become a huge field of algorithmic research due
to the proliferation of massive data sets which cannot be stored in
memory. We have shown that for certain data streams which do not delete an
unbounded fraction of the items they insert, more space efficient
algorithms for L_0 and L_1 norm related problems are feasible. So far,
improvements for L_2 algorithms have remained elusive, can we make any?

Jay Yoon Lee

Dependency parsing in multilingual setting: Languages have commonality
between each other, especially if they are in the same family of language
(e.g. Germanic). So can we share the parsing rules across language and have
effect of having larger data. In pursuing this project, I hope to see
improvement over single parser especially for low resource (smaller labeled
data) cases using multi-task/continual learning approach.

Benjamin Lengerich

GenAMap (http://genamap.org) is an open-source platform for genomic
association mapping. We are looking for motivated undergraduates to guide
the development of the software. Specifically, we are looking for a student
interested in the design of parallelized machine learning systems to
integrate our genomic algorithms with the open-source Petuum
framework. Knowledge of C++ is required.

Roahn Sawhney

The Javascript based geometry processing framework I developed has scope
for a lot of student projects - e.g., adding optimization, geometry,
machine learning algorithms, developing techniques to efficiently transmit
3D data over a network (ties in with the point cloud project), expanding
the functionality and improving the performance of the linear algebra
module, building creative interfaces for developing 3D geometry pipelines.

Yong Kiam Tan

I am interested in working with any undergrad on CakeML-related projects,
especially verifying new compiler optimizations. Some background in logic,
PL and compilers is needed. The student might need to learn interactive
theorem proving in HOL4, but that can be picked up at the start of the
project. Non-exhaustive list of ideas here: https://cakeml.org/projects.

Di Wang

Rapid Type Analysis (RTA) [1] is a widely used whole-program
interprocedural analysis for OO languages, such as Java. Java projects
often share a large common codebase (eg. JDK). To speed up RTA on large
Java projects, one crucial point is to reduce redundant analyses on the
common codebase. I want to develop a library summarization technique [2,3]
to achieve this goal.

[1] David F. Bacon and Peter F. Sweeney. 1996. Fast static analysis of C++
virtual function calls. In Proceedings of the 11th ACM SIGPLAN conference
on Object-oriented programming, systems, languages, and applications
(OOPSLA '96). ACM, New York, NY, USA, 324-341.
[2] Hao Tang, Di Wang, Yingfei Xiong, Lingming Zhang, Xiaoyin Wang, and Lu
Zhang. 2017. Conditional Dyck-CFL Reachability Analysis for Complete and
Efficient Library Summarization. In Proceedings of the 26th European
Symposium on Programming Languages and Systems - Volume 10201, Hongseok
Yang (Ed.), Vol. 10201. Springer-Verlag New York, Inc., New York, NY, USA,
880-908.
[3] Sulekha Kulkarni, Ravi Mangal, Xin Zhang, and Mayur
Naik. 2016. Accelerating program analyses by cross-program
training. SIGPLAN Not. 51, 10 (October 2016), 359-377.

Ziqi Wang

Students can participate in a related research project by coding up some
novel lock-free and lock-based data structures based on recent publications
on this area. These data structures could serve as a benchmarking baseline
for testing transactional memory implementations and other parallel
systems. Students could gain abundant experience and knowledge by
conducting performance analysis and performance tuning on the data
structure they have coded themselves.

Katherine Ye

There are many possible projects outlined in the Penrose issue tracker: https://github.com/penrose/penrose/issues

For example:
add signed distance functions for arbitrary objects
design a prodirect manipulation capability
design semantics or extensibility mechanisms for Substance and Style
design an optimization-aware GUI
add a domain to Penrose (like group theory)
perform user studies and evaluations of the system

Christopher Yu

The project on spherically inverted convex hulls has some potential
applications to collision testing or other forms of simulation, since it
enables relatively efficient point inclusion tests against non-convex
volumes -- in the limit, all star domains can be efficiently tested. There
are some problems that an undergrad might be able to help with: selecting a
good sampling of inversion centers, for instance, or reducing the number of
half-space tests by simplifying the convex hull in the inverted space.

Back to CS300 home page.

15-300 Fall '18 Information related to 15-400 projects

Project Suggestions

Detecting Flaky Tests via Machine Learning

Debugging Neural Network Decisions

Project Suggestions from CS Department PhD students

15-300 Fall '18
Information related to 15-400 projects