Staged Database Systems

Home

Goals

People

Publications

Resources

Goals

As database servers handle more and more requests on increasingly larger databases the process of optimizing resource usage (both hardware and software) becomes more difficult. The ever-changing state-of-the-art hardware and the needs for increased functionality present a new set of bottlenecks to overcome.

The main philosophy of the Staged DB design is to group concurrent tasks per software and hardware resource in the system. We do not propose a major overhaul in existing DBMS software. Rather, a small number of targeted changes can encapsulate existing functions into micro-servers (stages) ready to process a group of tasks.

By better matching tasks with resources we can attack bottlenecks and optimize resource usage in ways that are not feasible with traditional DBMS designs. Our goals are:

Improve instruction and data cache performance in OLTP

When running OLTP, instruction-related delays in the memory subsystem account for 25 to 40% of the total execution time. In contrast to data, instruction misses cannot be overlapped with out-of-order execution, and instruction caches cannot grow as the slower access time directly affects the processor speed. The challenge is to alleviate the instruction related delays without increasing the cache size.
We propose Steps, a technique that minimizes instruction cache misses in OLTP workloads by multiplexing concurrent transactions and exploiting common code paths. One transaction paves the cache with instructions, while close followers enjoy a nearly miss-free execution. Steps yields up to 96.7% reduction in instruction cache misses for each additional concurrent transaction, and at the same time eliminates up to 64% of mispredicted branches by loading a repeating execution pattern into the CPU.

The next goal is to provide the tools and methodology to automate the application of Steps for improving instruction cache performance in commercial DBMS.
The single most important bottleneck in multi-processor systems running OLTP workloads is caused by data cache coherence traffic. We are currently exploring ways to apply Steps for minimizing data cache coherence misses.

Re-engineer relational engines to optimize resource usage in large-scale installations

Relational execution engines typically treat concurrent queries as independent tasks, evaluating every plan in isolation. Reusing in-memory data pages across different queries is the job of the buffer pool manager, which can only set policy and not actively participate in the query evaluation process. Reusing common computations across concurrent queries comes at the cost of materializing views and assumes prior workload knowledge. The challenge is to exploit all opportunities for reusing both data and computation across concurrent queries transparently, without introducing additional costs or requiring prior knowledge.

The next goal is to deploy the staged system in multi-processor environments and develop query scheduling algorithms to minimize response time while maintaining high resource utilization.

Future Plans

Apply the Staged DB design in distributed DBMS environments.
Develop auto-tuning tools for Staged DBMS.

Home | Goals | People | Publications | Resources