Date: Tue, 14 Jan 1997 18:57:59 GMT Server: NCSA/1.5.1 Last-modified: Mon, 18 Nov 1996 23:43:39 GMT Content-type: text/html Content-length: 24074
![]() |
|
Lately, a need to integrate specialized data management systems
such as geographic information systems (GIS), or multimedia
systems has gained importance. A large variety of different
data sets are available in various specialized repositories, and
users would like to access and manipulate these data sets in
a uniform way. Additionally, it is desirable to make the
specialized functionality provided by the individual
repositories available to the user application through a
homogeneous interface. At UCLA Data Mining Laboratory, we are
developing geoPOM (geoscientific Persistent Object Manager), a
heterogeneous geoscientific object system which provides a
homogeneous interface to heterogeneous spatial data repositories.
geoPOM provides an object-oriented spatial data model for
the definition of user-defined spatial object types.
Internally, geoPOM maps user-defined spatial object types
to different specialized spatial data repositories, and
employs their storage, search and spatial query capabilities.
In this paper, we focus on the goals, problems and approach
taken in geoPOM towards defining the spatial functionality
of the heterogeneous geoscientific object system available
to the user, as well as, the mapping of the spatial object
model to the diverse semantic and functional characteristics
of the heterogeneous spatial data repositories.
There exist a wide-variety of communication-intensive applications
which run in networks and platforms of greatly varying characteristics.
This implies the need for application level communication
optimization, which is the optimization of network communication
by exploiting application semantics as well as network and compute
node characteristics.
In this paper, we propose a flexible object-oriented framework called
FALCON for application level communication optimization by allowing
complementary network communication optimization techniques to be
combined in the form of matching stack layers at the endpoints of a
communication channel.
Each stack layer is composed of a pair of matching modules, executed
in the sender and receiver endpoints respectively.
To exploit application knowledge that is only available to one of the
communicating peers, the framework allows an executable stack layer
module to be supplied by either of the communication peers and provides
for safe transport to and execution at the other end of the channel.
Automatic rule-based optimization techniques similar to those used
for extensible database query optimization are developed to
optimize the communication channel stacks based on the characteristics
of available stack layers, required properties of the channel, and an
application-dependent cost model.
In addition to providing architectural support for optimizing network
communication, FALCON can also be used to introduce support for
new communication and computing paradigms in high-level distributed
computing environments. For example, while OMG's CORBA distributed
object management architecture adopts remote method invocation as its
primary communication mechanism, data streaming and service migration
can be easily accommodated within the FALCON framework.
Geoscience studies produce data from various observations,
experiments, and simulations at an enormous rate.
Exploratory data mining extracts "content information" from
massive geoscientific datasets to extract knowledge and provide
a compact summary of the dataset.
In this paper, we discuss how database query processing and
distributed object management techniques can be used to facilitate
geoscientific data mining and analysis.
Some special requirements of large scale geoscientific data mining
that are addressed include geoscientific data modeling, parallel query
processing, and heterogeneous distributed data access.
Geoscience studies produce data from various observations,
experiments, and simulations at an enormous rate. In this paper, we
present an overview of the Conquest parallel scientific query
processing system that we are developing at UCLA to tackle some of the
scientific data management problems presented by the proliferation of
geographic applications, data formats, and storage systems, as well as
the complexity of data and the computational requirements of
scientific queries. In particular, our goal is to provide the
necessary combination of expressiveness, ease of use, flexibility, and
efficiency to effectively support the analysis of complex
spatio-temporal geoscientific datasets maintained by heterogeneous
data sources in many different formats. Conquest's data model is rich
yet conceptually simple; it captures some important structural and
semantic properties common to geoscientific data which influence the
choice of query processing strategies, and is flexible enough to serve
as the canonical model for a wide variety of scientific and
non-scientific data. Conquest supports a graphical dataflow
programming environment in which scientists can interactively
manipulate and visualize scientific data. The extensible parallel
query execution server supports varieties of inter- and intra-operator
parallelism through the use of various support operators. To provide
a convenient heterogeneous distributed scientific data processing
environment to scientists, the system supports a set of interfaces to
a variety of scientific data sources including several external data
formats and database servers. We also report some early experiences
with benchmarking the performance of the system.
Motivated by the premise that heterogeneity of software applications and
hardware systems is here to stay, we are developing
OASIS, a flexible, extensible, and seamless environment for scientific data
analysis, knowledge discovery, visualization, and collaboration. In this
paper we discuss our OASIS design goals and present the system architecture
and major components of our prototype environment.
The important scientific challenge of understanding global climate
change is one that clearly requires the application of knowledge
discovery and data mining techniques on a massive scale. Advances in
parallel supercomputing technology, enabling high-resolution modeling,
as well as in sensor technology, allowing data capture on an
unprecedented scale, conspire to overwhelm present-day analysis
approaches. We present here early experiences with a prototype
exploratory data analysis environment, CONQUEST, designed to provide
content-based access to such massive scientific datasets. CONQUEST
(CONtent-based Querying in Space and Time) employs a combination of
workstations and massively parallel processors (MPPs) to mine
geophysical datasets possessing a prominent temporal component. It is
designed to enable complex multi-modal interactive querying and
knowledge discovery, while simultaneously coping with the
extraordinary computational demands posed the scope of the datasets
involved. After outlining a working prototype, we concentrate here on
the description of several associated feature extraction algorithms
implemented on MPP platforms, together with some typical results.
Exploratory data mining and analysis requires an extensible environment
which provides facilities for the user-friendly expression and rapid
execution of "scientific queries". In this paper we present the
Conquest environment and illustrate its use for exploratory data analysis
and data mining of spatio-temporal phenomena from geophysical datasets.
The output of simulations (e.g., global circulation models)
can run into terabytes.
The computational cost as well as the cost of storing and retrieving
model data can be quite high. Recently there have been some efforts
to develop on-line visualization capabilities that can be used, for
example, to monitor whether the model is behaving properly.
There are however, many other uses for on-line data analysis including
feature extraction, computational steering of the model, and
controlled saving of model output (e.g., more frequent samples of
state information under certain conditions).
Each of these applications is a potential client of model output data.
We present a software architecture which stresses modularity and
flexibility and supports a variety of clients. Some preliminary
performance numbers are given from a prototype implementation.
Geoscience studies produce data from various observations,
experiments, and simulations at an enormous rate. At UCLA, we are
developing the Conquest parallel scientific query processing system
to effectively support the analysis of complex spatio-temporal
geoscientific datasets maintained by heterogeneous data sources in
many different formats.
In this paper, we describe the Conquest data modeling framework which
captures some of the important semantics of geoscientific data and
common processing paradigms. The central concept behind the
Conquest data model is that of the field, which is the association of
geometric cells in a coordinate space with dependent variable values.
Cells with different semantics and structures can model a large
variety of traditional and scientific datasets. The properties of
cells also influence the choice of data storage, indexing, and query
optimization strategies. In addition to providing a flexible data
model to serve as the canonical model for a wide variety of scientific
and non-scientific data, it is important for a system to
be able to efficiently retrieve and operate on scientific data.
As a result, we define an algebra for the Conquest data
model in which queries and operations against scientific data can be
conveniently expressed. In addition, we present an overview of the
Conquest architecture, and some interesting query processing issues
related to parallelism, extensibility and heterogeneity.
Recent advances in fine and coarse-grained super computers have enable
scientists to create models which, in the past, were computationally
intractable. These advances have also provided scientists with the
opportunity to greatly improve the success of their simulation results
by allowing for finer model resolutions. Similarly, advances in sensor technology have led to instrument suites capable of
capturing high spatial resolution, multi-spectral data at very high
rates (e.g., Earth Observer Satellites (EOS) are expected to generate
a terabyte of data per day). Unfortunately, software environments for
storage, retrieval, analysis, interpretation and visualization of
scientific information have not keep pace with their hardware counterparts.
We have developed a prototype system called QUEST to provide
content-based query access to massive datasets. QUEST employs
workstations as well as teraFLOP computers to analyze geoscience data
in order to produce spatial-temporal features that are used as
high-level indexes into terabyte datasets. Examples are presented of
the use of QUEST for the content-based access of Global Circulation
Model (GCM) datasets.
A major challenge facing geophysical science today is the unavailability
of high-level analysis tools with which to study the massive amount of
data produced by sensors or long simulations of climate models.
As part of a NASA HPCC Grand Challenge effort [Mun92], we have developed
a prototype environment called QUEST to provide content-based query
access to massive datasets used in geophysical applications. QUEST
employs workstations as well as massively parallel processors to produce
spatio-temporal features that are used as high-level indexes into
terabyte datasets. This paper discusses our continued development of
the QUEST environment.
A major challenge facing geophysical science today is the unavailability
of high-level analysis tools with which to study the massive amount of
data produced by sensors or long
simulations of climate models.
We have developed a prototype system called QUEST to provide content-based
access to massive datasets. QUEST employs workstations as well as teraFLOP
computers to analyze geoscience data to produce spatial-temporal
features that can be used as high-level indexes.
Our first application area is global change climate modeling.
In the initial prototype, the first features extracted are cyclones
trajectories from the output of multi-year climate simulations produced
by a General Circulation Model.
We present an algorithm for cyclone extraction and
illustrate the use of cyclone indexes to access subsets
of GCM data for further analysis and visualization.
S. Nittel, J. Yang, and R.R. Muntz,
"Mapping a Common Geoscientific Object Model to Heterogeneous Spatial
Data Repositories",
The 4th ACM International Workshop on Advances in Geographic
Information Systems,
Rockville, Maryland, Nov. 1996.
E.C. Shek, R.R. Muntz, and L. Fillion,
"The Design of the FALCON Framework for Application Level Communication
Optimization",
UCLA CSD Technical Report #960039, Nov. 1996.
E. Mesrobian, R.R. Muntz, E.C. Shek, S. Nittel, M. La Rouche,
M. Kriguer, C.R. Mechoso, J.D. Farrara, P. Stolorz, and H. Nakamura,
"Mining Geophysical Data for Knowledge",
IEEE Expert, 11(5):34-44,
Oct. 1996.
Exploratory data mining and analysis requires a computing environment
which provides facilities for the user-friendly expression and rapid
execution of "scientific queries".
OASIS exploits emerging distributed object management technologies
to present a flexible, extensible, and seamless
environment for scientific data analysis,
knowledge discovery, visualization, and collaboration.
In this article we illustrate the use of OASIS
for exploratory data analysis and data mining
of spatio-temporal phenomena from large geophysical datasets.
E. Mesrobian, R.R. Muntz, E.C. Shek, S. Nittel, M. Kriguer,
and M. LaRouche,
"OASIS: An EOSDIS Science Computing Facility",
International Symposium on Optical Science, Engineering, and
Instrumentation, Conference on Earth Observing System,
Denver, Colorado, Aug. 1996.
In the course of global change studies, a scientist would often like
to efficiently store, retrieve, analyze and interpret selected data
sets from a large collection of scientific information scattered
across heterogeneous computational environments, Earth Observing
System data repositories, and to share the gleaned information with
other scientific communities. To facilitate the above activities, we
have developed OASIS, a flexible, extensible, and seamless environment
for scientific data analysis, knowledge discovery, visualization, and
collaboration.
E.C. Shek, R.R. Muntz, E. Mesrobian, and K. Ng,
"Scalable Exploratory Data Mining of Distributed Geoscientific Data",
Second International Conference on Knowledge Discovery and Data Mining,
Portland, Oregon, Aug. 1996.
E.C. Shek, E. Mesrobian, and R.R. Muntz,
"On Heterogeneous Distributed Geoscientific Query Processing",
Sixth International Workshop on Research Issues in Data Engineering:
Interoperability of Nontraditional Database Systems,
New Orleans, Louisiana, Feb. 1996.
E. Mesrobian, R.R. Muntz, E. Shek, S. Nittel, M. LaRouche, and M.
Krieger,
"OASIS: An Open Architecture Scientific Information System",
Sixth International Workshop on Research Issues in Data Engineering:
Interoperability of Nontraditional Database Systems,
New Orleans, Louisiana, Feb. 1996.
E.C. Shek, E. Mesrobian, and R.R. Muntz,
"Optimization of Access to Heterogeneous Data Repositories
in a Geoscientific Query Processing System",
Science Information Systems Interoperability Conference,
College Park, MD, Nov. 1995.
E. Mesrobian, R.R. Muntz, M. LaRouche, S. Nittel, E. Shek, and M.
Krieger,
"OASIS: An Open Architecture Scientific Information System",
Science Information Systems Interoperability Conference,
College Park, MD, Nov. 1995.
P. Stolorz, E. Mesrobian, R.R. Muntz, E.C. Shek, J.R. Santos, J. Yi, K. Ng,
S.Y. Chien, H. Nakamura, C.R. Mechoso, and J.D. Farrara,
"Fast Spatio-Temporal Data Mining of Large Geophysical Datasets",
The First International Conference on Knowledge Discovery and Data
Mining,
Montreal, Quebec, Canada, Aug 1995.
R.R. Muntz, E. Mesrobian, C.R. Mechoso, D. McCleese, R. Haskins,
R. Zurek, and T. Barnett,
"Integrating Distributed Object Management into EOS",
Geo Info Systems, 5(5):58-59, May 1995.
E. Mesrobian, R.R. Muntz, E.C. Shek, J.R. Santos, J. Yi, K. Ng,
S.Y. Chien, C.R. Mechoso, J.D. Farrara, P. Stolorz, and H. Nakamura,
"Exploratory Data Mining and Analysis Using Conquest",
IEEE Pacific Rim Conference on Communications, Computers, Visualization,
and Signal Processing,
Victoria, British Columbia, Canada, May 1995.
E. Mesrobian, R.R. Muntz, E.C. Shek, C.R. Mechoso, J.D. Farrara,
J.A. Sphar, and P. Stolorz,
"Real Time Data Mining, Management, and Visualization of GCM Output",
Supercomputing 94 Poster,
Washington, DC, Nov 1994.
E.C. Shek, and R.R. Muntz,
"The Conquest Modeling Framework for Geoscientific Data",
UCLA CSD Technical Report #940039, Oct 1994.
E. Mesrobian, E.C. Shek, R.R. Muntz, and W. Cheng,
"Quest: An Environment for Content-Based Access to Geoscienitfic Datasets",
1994 International Geoscience and Remote Sensing Symposium,
Pasadena, CA, Aug. 1994.
E. Mesrobian, R.R. Muntz, E.C. Shek, C.R. Mechoso, J.D. Farrara,
and P. Stolorz,
"QUEST: Content-based Access to Geophysical Databases",
AAAI Workshop on AI Technologies in Environmental Applications,
Seattle, WA, Jul-Aug 1994.
E. Mesrobian, R.R. Muntz, J.R. Santos, E.C. Shek,
C.R. Mechoso, J.D. Farrara, and P. Stolorz,
"Extracting Spatio-Temporal Patterns from Geoscience Datasets",
IEEE Workshop on Visualization and Machine Vision,
Seattle, WA, Jun 1994.
Created: 5/12/95; Last Updated: 11/13/96