Researchers have argued for decades that functional programming simplifies parallel programming, in particular by helping programmers avoid difficult concurrency bugs arising from destructive in-place updates. However, parallel functional programs have historically underperformed in comparison to parallel programs written in lower-level languages. The difficulty is that functional languages have high demand for memory, and this demand only grows with parallelism, causing traditional parallel memory management techniques to buckle under the increased pressure. In this thesis, we identify a memory property called *disentanglement* and develop automatic memory management techniques which exploit disentanglement for improved efficiency and scalability. Disentanglement has broad applicability: (1) it can be guaranteed by construction in functional programs; (2) it is implied by *determinacy-race-freedom*, a classic correctness condition; and (3) it allows for general reads and writes to memory as long as concurrent threads do not acquire references to each other's allocated data. Additionally, entanglement (i.e., violations of disentanglement) can be detected dynamically during program execution with nearly zero overhead in practice. To exploit disentanglement for improved efficiency, we partition memory into a tree of heaps, mirroring the dynamic nesting of parallel tasks, which allows for allocations and garbage collections to proceed independently and in parallel. We develop multiple garbage collection algorithms in this setting. There are two significant implementation efforts presented in this thesis. The first is the MPL ("maple") compiler for Parallel ML, which extends the Standard ML functional programming language with support for (nested) fork-join parallelism. In MPL, we implement all of our memory management techniques based on disentanglement. The second is the Parallel ML Benchmark Suite, which provides implementations of sophisticated parallel algorithms for a variety of problem domains, ported from state-of-the-art C/C++ benchmark suites. All of these benchmarks are disentangled, which further evidences the wide applicability of our approach. In multiple empirical evaluations, we show that MPL outperforms modern implementations of both functional and imperative languages. Additionally, we show that MPL is competitive with low-level, memory-unsafe languages such as C++, in terms of both space and time. These results demonstrate that, through disentanglement, parallel functional programming can be efficient and scalable.