Date: Mon, 04 Nov 1996 23:50:00 GMT
Server: NCSA/1.5
Content-type: text/html
Last-modified: Wed, 07 Feb 1996 19:01:30 GMT
Content-length: 3428
Research
Brief description of work
Words are flowing out like endless rain into a paper cup,
They slither while they pass, they slip away across the universe
Query Result Size Estimation
Query optimizers make certain assumptions about the distribution of data in
the database in order to estimate the selectivities of various query operators.
For complex (multi-join) queries, errors in these estimates tend to propagate
exponentially, possibly resulting in suboptimal plans being chosen. This problem
has become serious even in well-established commerical optimizers. Some form of
histograms that approximate data distributions has thus become necessary. We
have identified the optimal histograms for estimating result sizes of queries
with multiple joins and arbitrary selections. These histograms are formed based
on statistics collected from the database, and the resources to do this, mainly
catalog space and collection time, are very limited in a commerical environment.
To address this, we have also identified a version of the above histograms that
are near optimal and can be constructed very efficiently.
Applications
Query Optimization, Query Design (User Interfaces).
Query Cost Estimation
An equally critical issue is the estimation of query costs. We are currently
designing a unified data structure, which would capture information about
physical clustering of data on disk pages in addition to data distributions.
The structure is as compact as a histogram and will estimate both sizes and
costs of widely varied operators (e.g., unclustered index scans, or intermediate
joins on RIDs). We are studying the optimality and practicality of this
structure, hoping that the results will make a variety of estimation algorithms
unnecessary, while providing better estimates of both result sizes and query
costs.
Applications
Query Optimization, Query Design (User Interfaces), Physical Database Design, Load Balancing.
Other Problems
Other problems that we investigated include estimation for parallel join load
balancing, result sizes of additional operators, parallel query optimization.
Implementations
- Designed and implemented efficient algorithms for computing and using histograms
for size estimation in the DB2-6000 optimizer.
- Designed and implemented the size estimation module for the Precis Query Profiler,
used by the Lockheed Martin Missiles and Space Company.
If you would like to get a copy of the postscript of any of the publications e-mailed to you, please fill in the form below and click on the button.
Please, include your e-mail address and the document name in the mail. Thanks.