\begin{figure}
\centerline{\epsfxsize=6.5in \epsfbox{MHFS.eps}}
\caption{Macintosh Hierarchical File System Data Structures:  The highlighted
catalog record represents a single file.  The highlighted extent
record holds additional extent descriptors of the file's data fork
beyond the three in the catalog record.}
\label{MHFS}
\end{figure}


\section{The Macintosh Hierarchical File System}
\label{description}

  All Macintosh storage devices except floppy disks are partitioned
into one or more volumes.\cite{IMF} Volumes can contain four kinds of
items: files, directories, directory threads, and file threads. Each
item is described by a {\em catalog record}, which is analogous to a
Unix inode.\cite{FFS} Catalog records are organized in the on-disk
{\em catalog B-Tree}.  Directory contents are derived from searching
the catalog B-Tree.  Only a file can occupy space outside of its
catalog record.

  A Macintosh ``file'' contains two components, or {\em forks}.  The
{\em resource fork} is an indexed file containing code segments, menu
items, dialog boxes, etc.  The {\em data fork} has the ``stream of
bytes'' semantics of a Unix file's contents.  When we refer to a file,
we mean the data fork portion of the ``file''.

  Each fork is comprised of one or more {\em extents} or contiguous
runs of blocks. An {\em extent descriptor} encodes an extent's
starting block and length into a 32 bit quantity. The first {\em
extent record} (three extent descriptors) of each fork is a part of
the file's catalog record. Any further extent records are kept in the
{\em extents overflow B-Tree}.

  In addition to file and B-Tree extents, a volume also contains two
boot blocks, a volume information block, and a free-space bitmap.
There is a remarkable amount of redundancy in the on-disk data
structures, which improves crash recovery.

  While not strictly a part of the file system, it should be noted that
several catalog record fields are reserved for the exclusive use of
{\em Finder}, a program which handles user access to the file system
and automatically maintains associations between applications and data
files.  Thus, MacFS must also maintain this Finder info.

Figure \ref{MHFS} graphically summarizes MHFS, the Macintosh
Hierarchical File System.


\subsection{Naming and the Catalog B-Tree}

  Every file and directory on an MHFS volume has an identification
number, similar to an inode number in the Unix file system. However, a
file or directory is named by its {\em parent's} identification number
and the file or directory's filename, which is a 32 character string
that can contain nulls.  This combination is the search key to the
volume's catalog B-Tree file.

  The catalog B-Tree differs from a traditional B-Tree structure in
that all the nodes at each level of the B-Tree are linked together to
form a doubly linked list, and all of the records are in the leaf
nodes.  These variations permit accessing many items in the same
directory by traversing the leaves using the linked list.  Strictly
speaking, the MHFS B-Trees are a variant of B+-Trees \cite{Comer},
although Apple's technical documentation calls them B*-Trees.  For the
purposes of this paper, it is only important that they are B-Trees and
we refer to them as such.

  Each directory, including the root directory, contains its directory
thread, which has the empty filename.  The directory thread record
contains the name of the directory, and the id of the {\em parent} of
the directory. Similarly, file threads contain the name of a file and
the id of the directory they are in.  While every directory must
contain a directory thread, file threads are very uncommon.  In fact,
both are examples of MHFS' redundancy -- for undamaged trees, threads
are not strictly necessary.

  Both file and directory records contain 32 bytes of information used
by Finder.  Although the contents of the Finder information fields are
outside of the purvue of the file system, placing valid and useful
information in these fields in necessary for transparent migration of
data to a Macintosh.  Most importantly, the creator name and file type
information {\em must} be correct because some Macintosh applications
will only see ``correctly'' typed/created files.

  The first three extent descriptors for the catalog B-Tree file are
kept in the volume information block.  If the catalog B-Tree file
grows beyond three extents, the remaining extent descriptors are kept
in the extents overflow file.
 
\subsection{The Extents Overflow B-Tree and Finding a File's Extents}

  A file's catalog record contains the first three extent descriptors
of both its resource and data forks.  Many Macintosh files, even large
ones, have three or fewer extents. Any extents beyond the first three
are kept in the extents overflow B-Tree file. This B-Tree has the same
structure as the catalog B-Tree, except that it holds extent records
(three extent descriptors) and is searched by file ID, fork ID, and
logical block within fork. Finally, it cannot grow beyond three
extents. Those three extents are described by an extent record in the
volume information block.

%  Given a logical block number within a file, finding the
%corresponding physical block on the disk consists of examining the
%extent record in the file's catalog record and successively summing
%extent lengths to create logical offsets to compare against the
%requested logical block number. If the requested logical block number
%is not in the first three extents, a query is made to the extents
%overflow B-Tree, which returns a new extent record and starting
%logical offset.


