Loadable-module Framework for Linux CGroups (15-412)
Process control groups ("cgroups") is a part of the core Linux kernel that offers a mechanism for grouping processes together and controlling various aspects of their behaviour via several subsystems. A cgroup hierarchy is a tree of cgroups with zero or more subsystems attached to it. Each hierarchy presents itself to userland as a virtual filesystem (of type "cgroup"), in which each directory is a single cgroup, containing therein various control files (some belonging to subsystems, and some for regular cgroup management) and descendant cgroups as subdirectories. Which subsystems are attached is specified at hierarchy creation time, which is also filesystem mount time. (An example subsystem is the cpuset subsystem, which limits which processors tasks can run on and for how long.)
The goal of this project is to support addition and removal of subsystems at run-time as kernel modules. After a modprobe or insmod, the subsystem is registered in the cgroup infrastructure and can be mounted and used with a hierarchy.
Most subsystems directly hook in to other parts of the kernel with explicit function calls, such as cpuset, which the scheduler constantly interacts with. Some subsystems, however, offer functionality in a more passive way: net_cls, the network classifier, enables classification by ID of networking activity by tasks. Its subsystem (called "net_cls" to cgroups, and "cls_cgroup" to the rest of the kernel) offers a single control file per cgroup, which sets the class ID of tasks in that cgroup. Actual control of packet activity based on the class ID can be achieved in userland independent of the classifier itself. Such subsystems are ideal for modularization, and in fact cls_cgroup would already have been buildable as a module if cgroups hadn't been missing the necessary infrastructure to handle dynamically-loaded subsystems.
While this project was in progress, other developers added a block I/O controller subsystem (called "blkio"). During discussion of the fourth patch series it was suggested that blkio was a suitable candidate for becoming a module. Unlike net_cls, blkio depends on a substantial body of external code, namely the CFQ I/O scheduler ("cfq-iosched"). Luckily cfq-iosched is a dynamically loadable module, so now blkio can be built that way as well.
Project development was done in a git checkout of the mmotm ("mm of the moment") sources tree. I used stg on top of git to manage my changes as a patch series, and tested functionality by building the kernel to run as User-Mode Linux (sometimes with gdb attached). Finally, each iteration of the polished patch series was sent to LKML, the linux kernel mailing list, along with a few other cgroup-interested parties, for review, feedback, and approval.
The patch series was accepted into the -mm tree on 2010-01-06.
Two dependent patches for the block I/O subsystem were accepted into the -mm tree on on 2010-01-13.
Modular cgroup subsystems made it into the main line as of Linux 2.6.34-rc2 (2010-03-20).
A few other smaller changes were pursued in the course of this project. A draft patch to implement subsystem dependencies, a speculative feature that may or may not prove useful, was put together, but rejected on the principle of "don't accept extraneous features until it's clear that somebody wants to use them". Additionally, a small infrastructure invariant violation was found, reported, and worked around with warning signs.
blkio, the work of other developers, has successfully taken advantage of this framework (see above).
[Last modified Tuesday March 23, 2010]