Researchers have argued for decades that functional programming can greatly
simplify writing parallel programs, for example by controlling side-effects and
avoiding race-conditions. However, parallel functional programs have
historically performed poorly in comparison to their imperative counterparts.
The primary reason is that functional programs allocate memory at a high rate,
which only grows with parallelism, causing traditional memory management
techniques to buckle under the increased demand for memory.

In this talk, I identify a memory property called disentanglement and show that
it emerges naturally in race-free parallel programs. To exploit disentanglement
for improved efficiency, I describe how to integrate memory management with
thread scheduling, ultimately enabling each processor to allocate memory and
perform GC independently and in parallel. This approach yields a provably
efficient GC policy: for any race-free program with R* sequential live memory,
execution on P processors with GC requires O(P R*) memory in total, and only
O(P R*) additional work. We implemented these techniques in the MPL compiler,
and our experimental results show good performance and scalability. With respect
to a sequential baseline, MPL uses only 1-5x additional memory while obtaining
12-51x speedup on 70 cores. In a small sorting competition, MPL is 50% faster
than Go and nearly 2x faster than Java while using comparable space.