Fractals for Spatial and Temporal Databases
Keywords
Fractals, data mining, power laws, temporal and spatial data
Project Award Information
-
Award Number: IIS-9910606
-
Duration: 18 months, starting 9/15/99.
-
Title: Fractals for Spatial and Temporal Databases
Project Summary
The goal of this project is to use concepts from fractals to solve long-standing
database problems in query optimization, spatial databases, and temporal
databases. Moreover, it introduces the powerful tools of fractals, chaos
and non-linear systems for additional applications, like forecasting, time-sequence
data mining, buffering and prefetching, for large database applications.
Goals, Objectives, and Targeted Activities
We are focusing on two settings: (a) spatial data, where the goal
is to find power laws that describe real datasets
(b) temporal data, where we try to model bursty time sequences, that
are not modeled well by the traditional Poisson arrivals.
Indication of Success
We have already discovered power laws in multiple settings:
-
in spatial data, the spatial join selectivity across two datasets seems
to follow a power law [Faloutsos00]
-
in temporal data, unpublished work shows that bursty time sequences can
be modeled well using the so-called 'binary multifractals', or '80-20'
distributions.
-
in general, the fractal dimension was shown to determine the performance
of nearest-neighbor searches, illustrating that the so-called 'dimensionality
curse' is over-pessimistic [Korn01]
-
in selectivity estimations for real spatial datasets with areas (eg.,
a collection of lakes, islands etc) [Proietti00]
-
in the analysis of space-filling curves, like the Hilbert curve [Moon01]
Moreover, we have used fractals and power laws to develop a novel dimensionality
reduction method, which is fast, scalable and able to detect non-linear
correlations. [Traina00]
Long range results: This is an exploratory project, whose aim
is to show that power laws and fractals are the correct tools to use for
data mining and pattern discovery in large spatial and temporal datasets.
This is in contrast to the textbook approaches, which use the uniformity
and independence assumptions, and the Gaussian and Poisson distributions;
although easy to study, these assumptions are clearly unrealistic for an
overwhelming majority of real datasets.
Project Impact
-
Human Resources: A Ph.D. candidate, Mr. Spiros Papadimitriou, is
working on the project
-
Education and curriculum development: Several lectures on fractals
and power laws are incorporated in a new course, Multimedia
databases and data mining (15-826/10-603) which is a required
course in the CALD Masters program at CMU (CALD = Center for Automated
Learning and Discovery)
-
Industry -- collaborations, transfer of technology, patents.
-
collaborations with Dr. Flip Korn (AT&T Research) and Dr. Bernd-Uwe
Pagel (SAP) on dimensionality reduction.
-
Released software: FracDim http://www.cs.cmu.edu/~christos/SRC/FracDim-20001026.tar.gz
: A package in Perl, that computes fractal dimensions efficiently.
Project References
The following refereed publications mention the NSF support, since
March 2000:
-
[Faloutsos00] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and
Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD
2000, May 2000
-
[Korn01] Flip Korn, Bernd-Uwe Pagel and Christos Faloutsos, On the `Dimensionality
Curse' and the `Self-Similarity Blessing' IEEE-TKDE, vol. 13,
no. 1, pp. 96-111, Jan./Feb. 2001.
-
[Moon01] Bongki Moon, H.V. Jagadish, Christos Faloutsos and Joel
Salz Analysis of the Clustering Properties of the Hilbert Space-Filling
Curve IEEE Trans. on Knowledge and Data Engineering (IEEE-TKDE), vol. 13,
no. 1, pp. 124-141, Jan./Feb. 2001.
-
[Proietti00] Guido Proietti and Christos Faloutsos Analysis of
Range Queries and Self-Spatial Join Queries on Real Region
Datasets Stored Using and R-Tree IEEE TKDE, 12, 5, Sept./Oct. 2000,
pp 751-762.
-
[Traina00] Caetano Traina Jr., Agma Traina, Leejay
Wu and Christos Faloutsos Fast feature selection using fractal
dimension XV Brazilian Symposium on Databases
(SBBD), Paraiba, Brazil, October 2000.
Area Background
The project requires familiarity with spatial, temporal and multimedia
databases, as well as with fractals.
Area References:
-
Manfred Schroeder, Fractals, Chaos, Power Laws: Minutes from an Infinite
Paradise, W.H. Freeman and Company, New York, 1991.
-
Christos Faloutsos, Searching multimedia databases by content, Kluwer
Academic Publishers, Norwell, MA, 1996.
GPRA performance criteria
Discoveries at and across the frontiers of science and engineering:
Fractals span multiple disciplines, like storage systems design, performance
evaluation, spatial/geographical databases, financial time sequences, and
many more.
Connections between discoveries and their use in the service of society:
Power laws can detect patterns among spatial distributions, like hospitals,
residences and disease outbreaks, to help with urban planning - and this
is just one of the myriad applications.
A diverse globally-oriented workforce: Our efforts are inherently
cross-disciplinary, straddling databases, statistics, machine learning,
storage systems, networks and statistics.