GCOld: a benchmark to stress old-generation collection

david.detlefs@sun.com

Introduction

In the course of developing garbage collectors in Java[tm] Virtual Machine (JVM) implementions, we have noticed several characteristics of server applications that have large heaps. Often, an application will have a mix of allocated objects with quite different characteristic lifetimes. The weak generational hypothesis is often true: most allocated data is short-lived. But the case for the strong generational hypothesis, which asserts that younger objects are more likely to be garbage than older objects over all age ranges, is less clear. We often find that server applications have interactions with users on time scales of minutes (think of visiting a web site), and allocate data at the beginning of that interaction that persists for the duration of the interaction. If the durations of such interactions are sufficiently similar, then the overall lifetime behavior of such data is a "FIFO": the oldest data is most likely to be garbage.

The GCOld benchmark is a rudimentary attempt to model a range of applications with these general object lifetime characteristics.

Description

A run of GCOld consists of an initialization phase and a steady state. In our measurements we generally disregard the initialization phase and concentrate on the steady state. The program maintains an array of pointers to heads of binary trees, each a megabyte in size. The initialization phase consists of allocating the binary trees and initializing the array.

The steady state consists of a number of steps. Each step:

The ratios between how much of each activity are done on each step (and the number of steps) are controlled by command-line arguments, described below.

The "pointer mutation" work is added because the performance of some GC algorithms or components thereof (e.g., generational card scanning, some forms of concurrent collection) is strongly affected by the rate at which the mutator writes to old-generation objects. Each unit of pointer mutation consists of the random choice of a pair of reachable binary trees, and a path into that tree. The two subtrees at that path are swapped, so that the size of each tree remains the same (so the steady-state live data remains constant.)

Arguments

In invocation GCOld has the form:
  java GCOld live-data-size work short/long-ratio pointer-mut-rate steps
where

Authors

This benchmark was originally developed by Dave Detlefs. Matthias Jacob found some bugs, and Will Clinger did an extensive rewrite (with several bug fixes), leading to the 1.0 version.