Processing Complex Queries in a Distributed Database
Everyone knows distributed systems are hard. At MongoDB we want to make it easy to express complex queries and extract insights from your data, but we also need to be able to scale to enormous data sets. To help you scale, we support a deployment which partitions the data amongst multiple machines, but a distributed system complicates even simple queries. Efficiently grouping together results can involve non-trivial communication amongst several machines. For example, the distributed query implementation must coordinate with the auto-balancer to ensure that it sees a consistent view of the data across multiple nodes. In this talk, we'll explore the strategies MongoDB uses to find the matching results and coordinate any requested transformations.