Week | What We Plan To Do | What We Actually Did |
Apr 1-7 | Familiarize with the existing codebase; Get the base system building on the Gates machines; set up initial CUDA requirements | Fixed the makefile to build on most Linux machines; Relearned code structure and organization |
Apr 8-14 | Implement Goswami et al alogrithm using CUDA | Built CUDA parts of make file; Launch a CUDA kernel and have two renderers; Implement benchmarking and comparison code for deep analysis; Restructure simulation code to allow for CUDA and sequential renderer |
Apr 15-21 | Bugfixing of CUDA implementation; implement faster neighbor finding for sequential code | Implemented bucket creation and implemented the C part of the CUDA launch |
Mon 4/23-Thu 4/26 | Implement the kernel generation and density computation functions - Robbie | Kernel generation implemented; Spent time troubleshooting bucket generation |
Fri 4/27-Sun 4/29 | Implement the force computation function and have a fully working CUDA implementtion - Robbie | Implemented force and density functions; CUDA code complete but still with |
Mon 4/30-Thu 5/3 | Implement a log n order neighbor finding algorithm for sequential implementation - Christian | At the suggestion of Kayvon, implemented an ISPC version of the solver; Began writeup |
Fri 5/4-Sun 5/6 | Bugfix all prior implementations; begin writing analysis, describing work - Robbie | Worked on writeup and timings; generated better demonstration starting conditions |
Mon 5/7-Thu 5/10 | Gather timings and finish writing results and analysis - Christian | Finished all code bugs and refactored the implementation to improve performance for all three implementations; Finished writing the paper and analysis |
Getting a CUDA file to build. For now, it's just the CUDA file from the circle renderer.
At this point, we are still investigating how to dump frames for benchmarking without spawning a new window. This is something that we feel we should have in order to accurately measure and compare runtimes efficiently.
Most of the work was spent re-architecting the base code to allow for multiple renderers and neighbor finding algorithms. Now, it will be possible to support arbitrary renderers and simulators and select them at runtime via command line.
Made significant progress in understanding the algorithm for implementation. Plan to start implementation of CUDA render path in the next few days.