In this talk we address a dynamic variant of the classic NP-complete bin packing problem. In the bin packing problem we are presented with items of various sizes, s_1,s_2,...,s_n<=1, which we are required to pack into unit-sized bins, where the objective is to minimize the number of bins used. Bin packing can be used to model the problem of data backup, while attempting to minimize the number of disks used to backup the data. This problem is unfortunately known to be NP complete, and so it is unlikely that there is an efficient algorithm to compute an optimal solution to this problem. To make matters worse, the data we wish to back up is ever changing — over time files are deleted, while new files are created. In order to remain competitive with the optimal solution, this requires us to move file backups between disks. One way of addressing changes in data would be to recompute a solution to the resulting bin packing instance from scratch, but this would result in a high number of disk writes, incurring a high energy cost and possible wear and tear of the disks.

The problem of backing up dynamic data can be modeled as an instance of the dynamic bin packing problem, where items of different sizes are added and deleted over time, and our task is to minimize the number of unit-sized bins used to pack the items not yet deleted, while moving as few items as possible. In this work, we address this problem, and present optimal tradeoffs between number of bins used and number of items repacked (as well as natural extensions of the latter measure).

Joint work with Anupam Gupta (CMU), Guru Guruganesh (CMU) and Amit Kumar (IIT Delhi).

Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.