15-418/15-618: Parallel Computer Architecture and Programming, Fall 2022:
Your 15-418/15-618 final project gives you the opportunity to dive deeply into a parallel systems
problem of your choosing for the final month of the course. What you attempt for your project
is up to you (subject to approval by your course instructors). We have two requirements for
your project: (i) it should be challenging (you should learn something relevant to the themes of
this class), and (ii) we want your project to be fun—you should be excited to work on it!
- Project Proposal Due: Wednesday, March 23rd, 11:59pm
- Milestone Report Due: Monday, April 11th, 11:59pm
- Final Report and Recording Due: Thursday, May 5th, 11:59pm
Choosing a project
One common way to choose a project is to design a parallel solution to
a problem in an application area that is interesting to you. Projects
you have attempted in other classes are a good source of ideas. For
example, projects in machine learning, AI, graphics, computational
photography, and computer vision often stand to benefit greatly from
parallelization. If you can convince the course staff that a parallel
programming problem in one of these application domains is
sufficiently challenging (that is, the solution to get good speedup is
not obvious to you from the start), it’s likely it will make a good
Other project ideas focus on system design or workload evaluation. For
example, a project might compare the performance of CPU and GPU
implementations of a parallel algorithm, and describe which platform
is better suited for the job. Alternatively you could choose to
evaluate different versions of an algorithm for different
architectures. You could simulate the behavior of code on machines
with different SIMD widths, add a feature to the ISPC compiler (its
implementation is open source), or develop a parallel debugging tool
that helps visualize bottlenecks and performance in parallel programs.
You may implement your project on any parallel platform. The machines
in the GHC labs, the latedays cluster, GPUs, PSC machines, Amazon EC2,
iPhone/iPad/Android SoC, FPGAs, simulators are all possible and
welcome platforms for projects. If you need specific resources, please
let the staff know immediately and we’ll see what we can provide. For
example, many professors have high core count machines (32-64 cores)
for the research groups and it is likely possible to get you access
for your projects.
Coming up with a project is up to you. You can find a list of previous
projects Spring 2012, Spring 2013, Spring 2014, Spring 2015, Fall
2016, Fall 2017, Fall 2018, Fall 2019, and Fall 2020 online.
The purpose of the proposal is two-fold:
- Writing your ideas down forces you to organize your thoughts about your project.
- It gives 15-418/618 course staff the ability to verify your plans are of the right scope given
our expectations (it also gives us the ability to offer suggestions and help).
Please create a public github repo for your
project. Your repo should contain a README.md that contains the
following sections and content:
- TITLE: Please provide the title of your project, followed by the names of all team members.
Teams must be two students, with extremely rare exceptions.
- SUMMARY: Summarize your project in no more than 2-3 sentences. Describe what you plan
to do and what parallel systems you will be working with. Example one-liners include (you
should add a bit more detail):
- We are going to implement an optimized Smoothed Particle
Hydrodynamics fluid solver on the NVIDIA GPUs in the lab.
- We are going port the Go runtime to Blacklight.
- We are going to create optimized implementations of
sparse-matrix multiplication on both GPU and multi-core CPU
platforms, and perform a detailed analysis of both systems’
- We are going to back-engineer the unpublished machine
specifications of the GPU in the tablet my partner just
purchased. 2 • We are going to implement two possible
algorithms for a real-time computer vision application on a
mobile device and measure their energy consumption in the lab.
- BACKGROUND: If your project involves accelerating a
compute-intensive application, describe the application or piece
of the application you are going to implement in more detail. This
description need only be a few paragraphs. It might be helpful to
include a block diagram or pseudocode of the basic idea. An
important detail is what aspects of the problem might benefit from
parallelism and why?
- THE CHALLENGE: Describe why the problem is challenging. What
aspects of the problem might make it difficult to parallelize? In
other words, what to you hope to learn by doing the project?
- Describe the workload: what are the dependencies, what are its
memory access characteristics? (is there locality? is there a high
communication to computation ratio?), is there divergent
- Describe constraints: What are the properties of the
system that make mapping the workload to it challenging?
- RESOURCES: Describe the resources (type of computers, starter
code, etc.) you will use. What code base will you start from? Are
you starting from scratch or using an existing piece of code? Is
there a book or paper that you are using as a reference (if so,
provide a citation)? Are there any other resources you need, but
haven’t figured out how to obtain yet? Could you benefit from
access to any special machines?
- GOALS AND DELIVERABLES: Describe the deliverables or goals of
your project. This is by far the most important section of the
- Separate your goals into what you PLAN TO ACHIEVE ("100%"
-- what you believe you must get done to have a successful
project and get the grade you expect) and an extra goal or two
("125%") that you HOPE TO ACHIEVE if the project goes really
well and you get ahead of schedule, as well as goals in case
the work goes more slowly ("75%"). It may not be possible to
state precise performance goals at this time, but we encourage
you be as precise as possible. If you do state a goal, give
some justification of why you think you can achieve it. (e.g.,
I hope to speed up my starter code 10x, because if I did it
would run in real-time.)
- If applicable, describe the demo you plan to show at the
poster session (Will it be an interactive demo? Will you show
an output of the program that is really neat? Will you show
speedup graphs?). Specifically, what will you show us that
will demonstrate you did a good job?
- If your project is an analysis project, what are you hoping
to learn about the workload or system being studied? What
question(s) do you plan to answer in your analysis?
- Systems project proposals should describe what the system
will be capable of and what performance is hoped to be
- PLATFORM CHOICE: Describe why the platform (computer and/or
language) you have chosen is a good one for your needs. Why does
it make sense to use this parallel system for the workload you
- SCHEDULE: Produce a schedule for your project. Your schedule
should have at least one item to do per week. List what you plan
to get done each week from now until the parallelism competition
in order to meet your project goals. Keep in mind that due to
other classes, you’ll have more time to work some weeks than
others (work that into the schedule). You will need to re-evaluate
your progress at the end of each week and update this schedule
accordingly. Note the intermediate checkpoint deadline is April
11th. In your schedule we encourage you to be precise as precise
as possible. It’s often helpful to work backward in time from your
deliverables and goals, writing down all the little things you’ll
need to do (establish the dependencies).
The milestone exists is to give you a deadline approximately halfway
through the project. The following are suggestions for information to
include in your milestone write-up. Your goal in the writeup is to
assure the course staff (and yourself) that your project is proceeding
as you said it would in your proposal. If it is not, your milestone
writeup should emphasize what has been causing you problems, and
provide an adjusted schedule and adjusted goals. As projects differ,
not all items in the list below are relevant to all projects.
Make sure your project schedule on your main project page is up to
date with work completed so far, and well as with a revised plan of
work for the coming weeks. As by this time you should have a good
understanding of what is required to complete your project, I want to
see a very detailed schedule for the coming weeks. I suggest breaking
time down into half-week increments. Each increment should have at
least one task, and for each task put a person’s name on it.
Add a new section called MILESTONE to your README.md:
- In one to two paragraphs, summarize the work that you have
completed so far. (This should be easy if you have been
maintaining this information on your project page.)
- Describe how you are doing with respect to the goals and
deliverables stated in your proposal. Do you still believe you
will be able to produce all your deliverables? If not, why? What
about the ”nice to haves”? In your milestone writeup we want an
updated list of goals that you plan to hit for the poster session.
- What do you plan to show at the poster session? Will it be a
demo? Will it be a graph?
- Do you have preliminary results at this time? If so, it would
be great to included them in your milestone write-up.
- List the issues that concern you the most. Are there any
remaining unknowns (things you simply don’t know how to solve, or
resource you don’t know how to get) or is it just a matter of
coding and doing the work? If you do not wish to put this
information on a public web site you are welcome to email the
Finally, you should write up your final project report in the style of
a research paper. Please go into detail regarding the analysis that
you did throughout your project, including any initial approaches that
did not work well (and how you diagnosed and fixed their performance
problems), etc. To go into this amount of detail, a typical report
might be roughly 10 pages long (double-spaced, including figures),
although that is just a ballpark estimate (we will not be counting
pages). While you have flexibility in how you structure you report,
we suggest that you include the following sections. (You are also
encouraged to provide more detail if you wish.) Note that some of the
information in your final writeup can be pulled directly from your
proposal if it is still accurate.
- SUMMARY: A short (no more than a paragraph) project summary. If
applicable, the summary should list your project deliverables
(including what you plan to show at the parallelism competition) and
what machines they ran on. Here are some examples:
- We implemented smooth particle hydrodynamics in CUDA on the GPU and in ISPC
on the CPU and compared the performance of the two implementations.
- We parallelized a chess bot. Our 64 core implementation on Blacklight achieves a
40x speedup and won several games on an internet chess server.
- We accelerated image processing operations using the GPU. Given the speed
of our implementation, we demonstrate that a brute-force approach to breaking
CAPTCHAS is effective
- BACKGROUND: Describe the algorithm, application, or system you parallelized in computer science terms. Figures would be really useful here.
- What are the key data structures?
- What are the key operations on these data structures?
- What are the algorithm’s inputs and outputs?
- What is the part that computationally expensive and could benefit from parallelization?
- Break down the workload. Where are the dependencies in the program? How much
parallelism is there? Is it data-parallel? Where is the locality? Is it amenable to
- APPROACH: Tell us how your implementation works. Your description should be sufficiently
detailed to provide the course staff a basic understanding of your approach. Again, it might
be very useful to include a figure here illustrating components of the system and/or their
mapping to parallel hardware.
- Describe the technologies used. What language/APIs? What machines did you target?
- Describe how you mapped the problem to your target parallel machine(s). IMPORTANT: How do the data structures and operations you described in part 2 map to
machine concepts like cores and threads. (or warps, thread blocks, gangs, etc.)
- Did you change the original serial algorithm to enable better mapping to a parallel
- If your project involved many iterations of optimization, please describe this process
as well. What did you try that did not work? How did you arrive at your solution? The notes you’ve been writing throughout your project should be helpful here.
Convince us you worked hard to arrive at a good solution.
- If you started with an existing piece of code, please mention it (and where it came
- RESULTS: How successful were you at achieving your goals? We expect results sections to
differ from project to project, but we expect your evaluation to be very thorough (your
project evaluation is a great way to demonstrate you understood topics from this course).
Here are a few ideas:
- If your project was optimizing an algorithm, please define how you measured performance. Is it wall-clock time? Speedup? An application specific rate? (e.g., moves
per second, images/sec)
- Please also describe your experimental setup. What were the size of the inputs? How
were requests generated?
- Provide graphs of speedup or execute time. Please precisely define the configurations
being compared. Is your baseline single-threaded CPU code? It is an optimized
parallel implementation for a single CPU?
- Recall the importance of problem size. Is it important to report results for different
problem sizes for your project? Do different workloads exhibit different execution
- IMPORTANT: What limited your speedup? Is it a lack of parallelism? (dependencies) Communication or synchronization overhead? Data transfer (memory-bound or
bus transfer bound). Poor SIMD utilization due to divergence? As you try and answer these questions, we strongly prefer that you provide data and measurements to
support your conclusions. If you are merely speculating, please state this explicitly.
Performing a solid analysis of your implementation is a good way to pick up credit
even if your optimization efforts did not yield the performance you were hoping for.
- Deeper analysis: Can you break execution time of your algorithm into a number
of distinct components. What percentage of time is spent in each region? Where is
there room to improve?
- Was your choice of machine target sound? (If you chose a GPU, would a CPU have
been a better choice? Or vice versa.)
- REFERENCES: Please provide a list of references used in the project.
- LIST OF WORK BY EACH STUDENT, AND DISTRIBUTION OF TOTAL CREDIT:
Please list the work performed by each partner. Given that you worked in a
group, how should the total credit for the project be distributed amongst the participants?
(e.g., 50%-50%, 60%-40%, etc.) If you do not feel comfortable placing this information
on a public web page, it is okay to include it only on the version that you submit via
- Your final report should be included in your Git repo as a
separate TeX / Word / etc file + a PDF.
Groups will make a five-minute recording about your project.
IMPORTANT: Make sure that your presentation
covers the essentials of your project (what, why, how) in these five minutes!
Spring 2022 Projects:
- Parallel Lock-Free Algorithms and Synchronization Primitives
- Parallel Machine Learning
- Parallel Game Algorithms
Parallel Chess Engine with OpenMP,
Parallel Multi-player Tetris solver,
Yen Li Laih and Weiwei Lin
Parallel Sudoku Solver,
Yuxin Zhang and Alex Luo
Tanvi Bhargava and Grayson Moyer
Parallel Othello Solver,
Miranda Lin and Ashwin Srinivasan
Parallel Dungeon Generation,
Larry Geng and Aleksander Tarczynski
Yunchieh Janabelle Wu and Goran Goran
- Parallel Graph Algorithms
Parallelizing Network Flow Algorithms,
Michelle Ling and Michelle Zhang
Parallel Cost-Bounded Graph Partitioning,
Abhishek Vijayakumar and Anna Cai
Parallel TSP Solver,
Yujun Qin and Dongxiao Han
Parallel K-means Clustering Algorithm,
Parallel Single Source Shortest Paths Algorithms,
Xinqi Wang and Yuou Lei
Parallel Influence Maximization in Social Networks,
Yuling Wu and Zican Yang
P* - Parallel A*,
Aditya Kannan and Gabriel Lee
Parallelization of A* Search Algorithm,
William Foy and Jacob Zych
Parallel Heuristic Search,
Sachit Goyal and Vamsikrishna Ratnakaram
Parallel Luby's Algorithm,
Arvind Mahankali and Clarissa Xu
Parallelizing Kruskal's Algorithm and Boruvka's Algorithm for Finding Minimum Spanning Trees,
Sarayu Namineni and Ishika Saxena
OMPRRT: OpenMP-based rapidly exploring random trees,
Ravi Dudhagra and Saral Tayal
- Parallel Image Processing and Visualization
Parallel BVH Generation via GPU,
Denise Yang and Jai Madisetty
Poisson Image Editing - A Parallel Implementation,
Jiayi Weng and Zixu Chen
Project Lake: Efficient Water Surface Simulation with MPI and CUDA Parallelism,
Peillin Rao and Zheng Zhong
Parallel Optimization of the Ray-tracing Algorithm,
Yunxuan Xiao and Tianyi Sun
Exploring Parallel T-distributed Stochastic Neighbor Embedding,
Yue Yin and Zhe Chen
Path Tracing with CUDA,
Kiran Khambhla and Ben Cagan
Parallel Ray Tracing in CUDA and OpenMP,
Anjali Thontakudi and Winston Cheung
Parallelizing Image Stitching,
Jesse Chan and James Li
Parallel Pixelated Image Abstraction,
Anthony Meza and Kevin Xie
Parallel Image Blending,
Amanda Xu and Ziming He
CUDA-based Bag-of-words Scene Recognition,
- Parallelilizing Optimization Algorithms
- Parallel Architecture
- Miscellaneous Parallel Algorithms and Frameworks
Parllel Suffix Array Construction and Applications,
Chengji Yu and Sree Revoori
Parallel Huffman Decoding,
Joelle Lim and Srinidhi Kaushik
Parallel N-bodies Simulator,
Nathan Love and Kelly Woicik
High Frequency Parallel Crypto Streaming and Valuation,
Riccardo Santoni and Zongpeng Yu
Parallel Sampling Based Motion Planning using Probabilistic Road Maps (PRMs),
Prithu Pareek and Omkar Savkur
Pinnong Li and Yitao Meng
Barnes Hut algorithm for N-Body problem,
Ziyan Zhang and Xuanyi Li
57-152 Homework Solver,
Bas Yoovidhya and Jonathan Loungani
Parallelizing Gradient Calculations in t-SNE,
Jason Zhang and Vy Nguyen
L-System Forest Generation,
Joshua Mathews and Nolan Mass
KV Storage for Parallel Architecture,
Xinyu Zhang and Zhi Lin
Parallel Lossless Compressors,
Huilin Xiong and Qier Li
Optimizing Parallelization of Boid Simulation,
Claire Ko and Haley Carter
Dynamic Scheduling with Work Stealing - Framework and Analysis of Different Strategies,
Claire Ko and Haley Carter
Parallel sequence alignment algorithm,
Jiayin Ling and Ruolin Zhang
Parallel Matrix Inversion,
Allen Xie and Chi Nguyen
Parallelizing Natural Selection Simulation,
Hridayesh Joshi and William Giraldo