AFS Cache Manager Configuration

/etc/openafs

0) The cacheBlocks is always configured, though on some platforms the
   startup scripts attempt to select an appropriate value based on the
   amount of available disk space.  I see no reason to change this.

1) The chunkSize is the maximum size of any given cache chunk, and is
   also the size of data transfers done on the wire.  It should probably
   be set such that a substantial fraction of files will fit in a single
   chunk, for efficient wire transfers (recall that you get streaming
   only _within_ a FetchData RPC).  However, if you have a particularly
   small cache you want smaller chunks, to avoid having a few chunks use
   the entire cache.

   The current default chunksize is 16 (64K).  I agree that this is
   probably a couple of orders of magnitude too small for today.
   Instead I'd suggest a default in the range of 18-20, subject to
   the restriction that the chunk size should not be larger than
   (cacheBlocks/1000) or (cacheBlocks/cacheFiles), whichever is larger.
   Note that if cacheFiles is computed, it will satisfy this rule.
   In no case should we use a computed chunk size smaller than 8K.

2) The cacheFiles is the number of distinct data chunks which can be
   in the cache at once.  This is both the number of VNNNN files in the
   cache directory and the number of entries in the CacheItems file.

   The default behavior is to compute the number of files such that
   (a) There is at least one file for every 10 cache blocks.
   (b) There are at least enough files for the cache to be full of
       chunks which are 2/3 the maximum chunk size.
   (c) There are at least 100 files.

   This is then clipped to insure that we don't have more files than
   the number of blocks that will be available in the cache after the
   CacheItems and VolumeItems files.  The CellItems file is not currently
   taken into account.

   For rule (b) to have any effect, the chunk size would have to be less
   than 14 (16K).  The only way this can happen is if the chunk size is
   hand-configured to be very small, or if the cache is very small.  And
   for rule (c) to have any effect, you'd have to have a cache smaller
   than 1MB.  So yes, for reasonable values of cacheBlocks and chunkSize,
   rule (a) will dominate and you'll get cacheBlocks/10 files.

   However, I'm not convinced that these rules are still right for today.
   The right number of files is essentially a function of the cache size
   and the expected average chunk size.  Now, if we choose a large value
   for chunkSize as suggested above, then most chunks will contain whole
   files, and the average chunk size will be dominated by the average
   file size.  I think we can expect the average file size to be more or
   less a constant, and this is what rule (a) is intended to accomplish.
   However, I doubt the average file these days is as small as 10K.  In
   fact, a quick scan over the contents of my cell shows an average file
   size of 50K, to the extent to which volume header data is valid.
   So, I'd set the rule (a) limit to something conservative, like 32.

   On the other hand, if the chunk size is a small value, then rule (b)
   kicks in, making sure we have room for partially-full chunks even
   when the maximum chunk size is quite small.  We can probably leave
   this rule alone.

   Finally, I'd suggest increasing the rule c limit to 1000 files,
   rather than only 100.  I'm sorry, but 100 just seems tiny today.

   Also, we should adjust the max-files computation to take into account
   the expected size of the CellItems file (mine is about 4K).

3) OK, so much for the disk cache; now on to the in-memory structures.
   The CacheItems file is used to store an index of all the cache files,
   containing the FID, DV, offset, and size of each file in the cache,
   along with some additional information.  It is structured as an array,
   with one entry (about 40 bytes) for each cache file.  Rather than keep
   all of this data in memory or keep searching it on disk, the cache
   manager keeps a subset of this data in memory, in dcache entries.
   The dCacheSize is the number of these entries that are kept in memory.

   The default dCacheSize is currently half the number of cache files,
   but not less than 300 and not more than 2000.  I agree this range is
   probably too small.  Something in the range of 2000-10000 would seem
   reasonable; however, it should _never_ be larger than cacheFiles.

   The dCacheSize setting should approximate the size of the workstation's
   working set of chunks.  If the chunk size is large, this is close to
   the number of files whose contents (not metadata) are in the working
   set.  If the chunk size is very small, then it's probably some multiple
   of that number, though it likely gets complex.

   Unfortunately, I don't know a good way to guess what the size of a
   random machine's working set is going to be.  So we're probably back
   to using some property of the cache (cacheBlocks or cacheFiles) as an
   approximation.  The existing code uses cacheFiles/2, which might be
   a little over-agressive, but I suspect that cacheFiles/10 is on the
   low side.  Let's keep it at cacheFiles/2 for now.

4) The vcache stores metadata about files in AFS.  Any time we need to
   get information about a file that is not in the vcache, we must make
   an RPC to the fileserver.  So, you don't want the vcache to be too
   small, since that would result in lots of extra RPC's and considerable
   performance loss.  The ideal vcache size approximates the size of the
   workstation's working set of AFS files, including files for which we
   only care about metadata.

   It is worth noting that on most platforms, vcache entries contain
   vnodes, but these are _not_ drawn from the system vnode pool.  So, the
   size of the system vnode pool has little bearing on the vcache size.
   Even on those platforms where AFS uses vnodes from the system pool, it
   is important to remember that vcache entries cache information obtained
   via fileserver RPC's, and so throwing them away is somewhat costly.
   When possible, such platforms should be structured such that it is
   possible to have vcache entries without associated vnodes, so that we
   are not obligated to limit the vcache size or tie up a substantial
   fraction of the system vnode pool.

   The default vcache size is set to 300, which is probably way too small.
   Unfortunately, I still don't know a good way to approximate the size of
   a workstation's working set.  However, the problem is similar to the
   problem of sizing the dcache, so I'll propose making them dependent
   on each other, based on the chunk size:

   - chunkSize < 13:    cacheStatEntries = dCacheSize / 4
   - chunkSize < 16:    cacheStatEntries = dCacheSize
   - chunkSize > 16:    cacheStatEntries = dCacheSize * 1.5

   Further, if cacheStatEntries is configured and dCacheSize is not,
   then perhaps we should set dCacheSize based on these formulas rather
   than on cacheFiles, since the configured setting is more likely to
   reflect the user's impression of the working set size and the amount
   of memory available to devode to AFS.

5) The volume cache stores cached information about volumes, including
   name-to-ID mappings, which volumes have RO clones, and where they are
   located.  The size of the volume cache should approximate the size of
   the workstation's working set of volumes.  Entries in this cache are
   updated every 2 hours whether they need it or not, so unless you have
   a busy machine accessing lots of volumes at once, a pretty small
   number will probably be fine.

   Even though it was set for the small memory sizes of the 1980's, the
   default value of 50 is probably sufficient for single-user systems.
   For a larger multi-user system, a larger value might be appropriate.
   I'm going to go out on a limb here and guess that such a system ought
   to have something like 3-5 volcache entries per active user, bearing
   in mind that some of these will be used to cover shared volumes and
   things used by the system.

   It seems appropriate to make sure the default is sufficient for both
   single-user workstations and multi-user systems with a small number of
   active users.  To that end, I'll propose bumping the default number
   of volume cache entries to, say, 200.

Filename	Source
/etc/openafs/CellServDB.Transarc	worldwide database from IBM (defunct)
/etc/openafs/CellServDB.GCO	worldwide database from GRAND.CENTRAL.ORG
/etc/openafs/CellServDB.CMUCS	site-global, platform-independent entries
/etc/openafs/CellServDB.group	project-global entries
/etc/openafs/CellServDB.local	entries local to this machine

Filename	Source
/etc/openafs/CellAlias.global	site-global, platform-independent aliases
/etc/openafs/CellAlias.group	project-global aliases
/etc/openafs/CellAlias.local	aliases local to this machine

if chunksize is...	set stat cache size to...
2^13 or less	dcache / 4
2^14 or 2^15	dcache
2^16 or more	dcache * 1.5

Parameter	Suggested Value	Notes
-blocks	90% of cache partition	set in cacheinfo file
-chunksize	1MB	specified as a power of 2
-files	blocks/32	not more than 1000000
-dcache	files/2	2000 <= dcache <= 10000 </td>
-stat	dcache * 1.5
-volumes	200

AFS Cache Manager Configuration

Files

CellServDB

CellAlias

Quirks

Computing Cache Tuning Parameters

`-blocks` (Cache Blocks)

`-chunksize` (Cache Chunk Size)

`-files` (Number of Cache Chunks)

`-dcache` (Number of Data Cache Entries)

`-stat` (Number of Vnode Stat Cache Entries)

`-volumes` (Number of Volume Cache Entries)

Summary

Rationale

AFS Cache Manager Configuration

Files

CellServDB

CellAlias

Quirks

Computing Cache Tuning Parameters

-blocks (Cache Blocks)

-chunksize (Cache Chunk Size)

-files (Number of Cache Chunks)

-dcache (Number of Data Cache Entries)

-stat (Number of Vnode Stat Cache Entries)

-volumes (Number of Volume Cache Entries)

Summary

Rationale

`-blocks` (Cache Blocks)

`-chunksize` (Cache Chunk Size)

`-files` (Number of Cache Chunks)

`-dcache` (Number of Data Cache Entries)

`-stat` (Number of Vnode Stat Cache Entries)

`-volumes` (Number of Volume Cache Entries)