Light-weight In-situ Analysis with Frugal Resource Usage
In this talk Qing presents Parallel Logging DB (PLDB), a new in-situ analysis technique for indexing data within DeltaFS. With its design as a scalable, serverless file system for HPC platforms, DeltaFS scales file system metadata performance with application scale. The new PLDB is a novel extension to the DeltaFS data plane, enabling in-situ indexing of massive amounts of data written to a single DeltaFS directory simultaneously, and in an arbitrarily large number of files. PLDB achieves this through a compaction-free indexing mechanism for reordering and indexing data, and a log-structured storage layout to pack small writes into large log objects, all while ensuring compute node resources are used frugally. We demonstrate the efficiency of our PLDB through VPIC, a widely-used simulation code developed at Los Alamos National Lab that scales to trillions of particles. With DeltaFS, we modify VPIC to create a file under a special directory for each particle to receive write! s of that particle's output data. Dynamically indexing the directory's underlying storage using PLDB allows us to achieve a 5,000x speedup in VPIC particle trajectory queries, which require reading all data for a single particle. This speedup increases with simulation scale, while theoverhead is fixed at 3% of available memory and 8% of final storage.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement