Information related to 15-400 projects

- Weina Wang
- Leila Wehbe
- Heather Miller, programming languages for distributed systems
- Justine Sherry, networking
- Eunsuk Kang, formal modeling
- Nihar Shah - statistical learning and game theory
- Bob Murphy - computational biology
- Vicent Alevin - human-computer interaction
- Ryan O'Donnell - theory
- Aarti Singh - machine learning
- Jaime Carbonell - language technologies
- Bryan Parno - security
- Illah Nourbakhsh - robotics
- Keenan Crane - graphics
- Todd Mowry - systems
- Jonathan Aldrich - programming languages and software engineering

The project would primarily entail systems-building and software design for the tool, however, the student would do all of the following:

- write a back-end leveraging our explanation code, that does the necessary caching for responsiveness,
- design an interactive front-end for making refined queries on a model's decisions,
- gain experience in using the standard deep learning python frameworks,
- become familiar with the state-of-the art in transparent ML,
- help explore domain-specific applications of influence-directed explanations, and
- help develop interpretation techniques useful for understanding a deep model's decisions in various domains.

We expect this project would be well-suited for a student with experience in software engineering, and an interest in applications and research in machine learning. Contact Klas Leino

I'd love to have an undergrad continue the work I started with Glacier, my

class immutability system for Java. I'd like to build a type inference

system, apply it to the JDK, and then do a corpus study to see how

applicable Glacier is. I'm also looking for students to help with the

Obsidian implementation; a request is already in the queue to go out in

January.

Zack Coker

One common way to evaluate automated program repair is to use test cases -

if all the test cases pass, then the fix matched what the developer wanted.

However, these approaches do not have as much information as someone

working on the project would (for example: which solution out of the space

of possible solutions is the best). It would be interesting to see how

much participants without experience in an application project are able to

infer about the correct fix and what the participants use to make that

inference.

Shayan Doroudi

Using data-driven models to accurately predict how humans learn is a

difficult task, because human learning is sophisticated. When we use a

statistical model of student learning that is overly simple, we can avoid

overfitting, but a number of other issues can arise due to the bias of the

overly simple model. In particular, we have recently shown that depending

on how much data we use from each student, we can learn drastically

different parameters for a particular statistical model of student

learning. In this project, we want to try to find statistical models that

are not susceptible to this issue. I am looking for a student that is

interested in machine learning and is also passionate about student

learning and education.

Pratik Fegade

We find that there is a general lack of a robust compiler infrastructure

for the Java language that impedes the rapid development of static and

dynamic analyses. It would be great if someone could work on bettering the

current tools. The current approach I have to understanding the user's

graph computation is a combination of pattern matching on source code to

look for common patterns, and possibly augmenting that with dynamic

analyses. I believe there are ways to have a more principled approach to

this. Maybe a data driven approach to identifying how people write the

common computations we want to look for and refining the pattern matching,

or even making use of auxiliary information like variable/method names,

code comments to identify computations.

Graham Gobieski

I have a variety of smallish project ideas that potentially a qualified

undergraduate student could work on. These include the following:

1. Investigating capsule networks as a way to reduce the overall footprint

(memory and operations) of a inference model 2. Help write some

intermittent code in order to support audio processing library functions.

3. Investigate how to use non-negative matrix factorization to reduce the

size of fully-connected layers in neural networks. 4. Investigate how

capacitor size effects the fraction of time spent running for an

intermittent device. Do this analysis across a range of capacitor sizes and

capacitor technologies. Projects three and four are probably the most

suitable for an undergraduate, but project one and two might be possible

for a motivated individual.

Isaac Grosof

I'm interested in working with someone on analyzing policies which schedule

jobs with known sizes on multiple processors. Specifically, I'm interested

in the competitive ratio of the Shortest Remaining Processing Time policy

on many processors. I want to transfer the existing worst-case analysis in

this area to the stochastic domain, and improve upon it. Interest in theory

is all that's needed, no prior exposure is necessary.

Rajesh Jayaram

Data stream algorithms have become a huge field of algorithmic research due

to the proliferation of massive data sets which cannot be stored in

memory. We have shown that for certain data streams which do not delete an

unbounded fraction of the items they insert, more space efficient

algorithms for L_0 and L_1 norm related problems are feasible. So far,

improvements for L_2 algorithms have remained elusive, can we make any?

Jay Yoon Lee

Dependency parsing in multilingual setting: Languages have commonality

between each other, especially if they are in the same family of language

(e.g. Germanic). So can we share the parsing rules across language and have

effect of having larger data. In pursuing this project, I hope to see

improvement over single parser especially for low resource (smaller labeled

data) cases using multi-task/continual learning approach.

Benjamin Lengerich

GenAMap (http://genamap.org) is an open-source platform for genomic

association mapping. We are looking for motivated undergraduates to guide

the development of the software. Specifically, we are looking for a student

interested in the design of parallelized machine learning systems to

integrate our genomic algorithms with the open-source Petuum

framework. Knowledge of C++ is required.

Roahn Sawhney

The Javascript based geometry processing framework I developed has scope

for a lot of student projects - e.g., adding optimization, geometry,

machine learning algorithms, developing techniques to efficiently transmit

3D data over a network (ties in with the point cloud project), expanding

the functionality and improving the performance of the linear algebra

module, building creative interfaces for developing 3D geometry pipelines.

Yong Kiam Tan

I am interested in working with any undergrad on CakeML-related projects,

especially verifying new compiler optimizations. Some background in logic,

PL and compilers is needed. The student might need to learn interactive

theorem proving in HOL4, but that can be picked up at the start of the

project. Non-exhaustive list of ideas here: https://cakeml.org/projects.

Di Wang

Rapid Type Analysis (RTA) [1] is a widely used whole-program

interprocedural analysis for OO languages, such as Java. Java projects

often share a large common codebase (eg. JDK). To speed up RTA on large

Java projects, one crucial point is to reduce redundant analyses on the

common codebase. I want to develop a library summarization technique [2,3]

to achieve this goal.

[1] David F. Bacon and Peter F. Sweeney. 1996. Fast static analysis of C++

virtual function calls. In Proceedings of the 11th ACM SIGPLAN conference

on Object-oriented programming, systems, languages, and applications

(OOPSLA '96). ACM, New York, NY, USA, 324-341.

[2] Hao Tang, Di Wang, Yingfei Xiong, Lingming Zhang, Xiaoyin Wang, and Lu

Zhang. 2017. Conditional Dyck-CFL Reachability Analysis for Complete and

Efficient Library Summarization. In Proceedings of the 26th European

Symposium on Programming Languages and Systems - Volume 10201, Hongseok

Yang (Ed.), Vol. 10201. Springer-Verlag New York, Inc., New York, NY, USA,

880-908.

[3] Sulekha Kulkarni, Ravi Mangal, Xin Zhang, and Mayur

Naik. 2016. Accelerating program analyses by cross-program

training. SIGPLAN Not. 51, 10 (October 2016), 359-377.

Ziqi Wang

Students can participate in a related research project by coding up some

novel lock-free and lock-based data structures based on recent publications

on this area. These data structures could serve as a benchmarking baseline

for testing transactional memory implementations and other parallel

systems. Students could gain abundant experience and knowledge by

conducting performance analysis and performance tuning on the data

structure they have coded themselves.

Katherine Ye

There are many possible projects outlined in the Penrose issue tracker: https://github.com/penrose/penrose/issues

For example:

add signed distance functions for arbitrary objects

design a prodirect manipulation capability

design semantics or extensibility mechanisms for Substance and Style

design an optimization-aware GUI

add a domain to Penrose (like group theory)

perform user studies and evaluations of the system

Christopher Yu

The project on spherically inverted convex hulls has some potential

applications to collision testing or other forms of simulation, since it

enables relatively efficient point inclusion tests against non-convex

volumes -- in the limit, all star domains can be efficiently tested. There

are some problems that an undergrad might be able to help with: selecting a

good sampling of inversion centers, for instance, or reducing the number of

half-space tests by simplifying the convex hull in the inverted space.

Back to CS300 home page.