 
  Principles of Programming Group
Carnegie Mellon University, Computer Science Department
PoP Seminar Talk
The Gibbon Compiler: Efficient data representation and program correctness, pick two
    Chaitanya Koparkar,
    PhD Student at Indiana University
    Monday, 30 January, 2023; 3:30pm
    Zoom
    Host: Bob Harper
    
Abstract
Gibbon is a compiler for a small (strict) subset of Haskell. It overturns a basic assumption in programming language implementation—the assumption that every program value of unknown size must be “boxed” into a pointer. Gibbon explores an alternate approach where even irregular data takes a dense, mostly pointer-free form in memory. It focuses on recursive sum types and transforms functions on them to operate on dense representations. This fits with how modern CPU architectures work, resulting in significant performance improvements. Tree traversals in Gibbon are usually an order of magnitude faster than GHC. Gibbon values can also be processed directly from disk, much like GHC’s Compact Normal Form or Cap ‘n Proto in C++, but are more compact and efficient to use. Finally, Gibbon also allows reading and writing mostly pointer-free values in parallel, providing the benefits of dense data formats and parallelism.
Gibbon’s compilation strategy is based on a type-safe location calculus that formalizes this notion of programming with dense pointer-free data. Gibbon features location inference that automatically transforms programs written in a high-level language to the location calculus for compilation. Thus, users can get the performance benefits of efficient data representations without having to write error-prone low-level code.
In this talk, we’ll summarize the work on Gibbon thus far, and ongoing development.
Bio
Chaitanya Koparkar is a Ph.D. candidate at Indiana University where he studies programming languages, compilers, and parallel programming. His research focuses on developing programming language theory and implementation strategies that enable a compiler to improve the run time performance of a program in an automatic and safe manner.