Deep learning training accesses vast amounts of data at high velocity, posing bandwidth challenges for datasets retrieved over commodity networks and storage devices. A common approach to reduce bandwidth involves resizing or compressing data prior to training. We introduce a way to dynamically reduce the overhead of fetching and transporting data with a method we term Progressive Compressed Records (PCRs). PCRs deviate from previous storage formats by combining progressive compression with an efficient on-disk layout to view a single dataset at multiple fidelitiesall without adding to the total dataset size. We show that the amount of compression a dataset can tolerate depends on the training task at hand. We then show that PCRs can enable tasks to readily access appropriate levels of compression at runtimeresulting in a 2x speedup in training time on average.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.
Remote Participation Enabled. See announcement for registration details.