Methods for Querying Compressed Wavefields
Julio López

Deparment of Electrical and Computer Engineering (ECE)
Carnegie Mellon University, Pittsburgh, PA

Abstract

Wavefield datasets are becoming increasingly large due to improvements in simulation techniques and advances in computer systems. Larger storage capacities enable scientists to store more data. For example, state of the art ground-motion numerical solvers produce Terabyte-size datasets per simulation. Operating on these datasets becomes extremely challenging due to decades of declining normalized storage performance both in terms of access latency and throughput. Data access and transfer rates have not kept pace with the increase in storage capacity or CPU performance, i.e., (seek time/disk capacity) and (transfer bandwidth / disk capacity) have decreased. As dataset sizes increase, it takes much longer to access the data on disk. We present new mechanisms that allow querying and processing large wavefields in the compressed domain (i.e, directly in their compressed representation). These mechanisms combine well-known spatial-indexing techniques with novel compressed representations in order to reduce bandwidth requirements when moving data from storage to main memory. The compression technique uses frequency domain representation to take advantage of the temporal redundancy found in wave propagation data, coupled with a new representation based on boundary integral equations which takes advantage of data spatial coherence. This approach transforms a large I/O problem into a massively-parallel CPU-intensive computation. Common queries to these datasets result in difficult to handle I/O workloads with semi-random access patterns. In the proposed representation I/O access patterns exhibit larger sequential patterns. The decompression stage for this approach places heavy demands on the CPU. The good news is that the decompression can be performed in parallel, and is well-suited for the surfacing many-core processors. We evaluate our approach in the context of post-processing of dataset produced by CMU Quake project.

BibTeX entry

@phdthesis	{ lopez-phd-thesis,
  author	= "Julio Lopez",
  title		= "Methods for Querying Compressed Wavefields",
  school	= "Department of Electrical and Computer Engineering,
		   Carnegie Mellon University",
  month		= "May",
  year		= 2007,
  ulr		= "\url{http://www.cs.cmu.edu/~jclopez/jclopez-phd-thesis.pdf}"
}