\section{MHFS Design vs. MacFS Implementation}

  The \B\ structure allows for relatively fast lookups (the tree has a
maximum of 8 levels), which leads to fast {\tt stat} operations
because, as described in section \ref{description}, all the file information
is stored in the \C.  This is particularly useful on the native
Macintosh because path traversal involves stating every file at each
directory level \footnote{For example, traversing the path {\bf a:b:c}
in either a dialog box or Finder involves opening {\bf a}, stating all
its items, having the user select {\bf b}, then stating all the items
in {\bf b} \ldots}.  However, inserting and deleting filesystem
metadata can be very expensive.  There are several other issues that
arose in our implementation.

\subsection{Dealing with Extents }

A large part of the B-Tree code is needed to handle extremely uncommon
cases which could have been elliminated if the MHFS design were less
flexible.  Complexity is added by the fact that each \B\ has an
internal, non-fixed size and noncontiguous bitmap spread among the
regular nodes. This bitmap records the allocation status of the \B\
nodes. This extra code is involved only if the number of files is
extremely large (30,000) because the first portion of the bitmap is
stored at a known location.  If the number of B-Tree records were
bounded, the complete bitmap could be stored at a known location.

  The feature that is responsible for most of the extra complexity of
the B-Tree code is that both trees are files, which can dynamically
grow and are not necessarily contiguous. The first three extent
descriptors of both B-Trees are stored in the volume information
block.  Any further extent descriptors are stored in the \E.  This
feature is not documented so we tried to experiment with a native
Macintosh file system. We noticed that the \C\ grows relatively
quickly because of the high size of a catalog record (5 records in a
node on average) while the \E\ grows very slowly because of a general
low fragmentation and because the extent record is small (40 records
in a node on average).  Our experiments have shown that the additional
extents for \C\ are stored in the \E.  We were unable to sufficiently
fragment a native Macintosh volume to cause the \E to grow beyond
three extents.  Therefore, in our implementation we refuse to grow the
extents tree beyond the three extents that can be stored in the volume
information block.  Growing the \E beyond three extents and storing
the additional extent descriptors in the \E\ itself would lead to
circularity and possible deadlock.

  Storing the extents of the \C\ in \E\ can lead to very expensive
metadata accesses even in the case of a simple lookup. This can happen
when the path from the root to the target leaf contains nodes located
in different extents that must be searched in the \E. To alleviate the
problem we currently cache the \C's extent descriptors.

\subsection{Making MacFS Re-entrant}

  One particular challenge is implementing a re-entrant Macintosh file
system. We note that Apple's implementation is not
reentrant.

  The ideal synchronization algorithm would be fine-grained -- at the
\B\ node level -- assuring that only conflicting threads are blocked.
Such an algorithm would be very difficult to implement due to the fact
that one cannot figure out at the beginning of an operation which
nodes along the path will be altered during the operation. In the case
when the root itself will be modified, all the concurrent operations
on that \B\ must be locked.

  We are in the process of implementing a multiple readers/one writer
exclusion algorithm that blocks the entire \B\ when one writer needs
to alter the tree but allows for multiple threads performing searches.
This solution should perform well because insertions and deletions in
the \C\ occur only when creating or deleting files and are much less
frequent that lookup operations.  The \E\ is not usually on the critical
path, so this will not affect \E\ operations.

  Our conclusion is that the designers of the Macintosh file system
considered the flexibility of the file system data structures the most
important issue. We feel that with simpler and possibly fixed size and
position data structures the file system would perform better and
would also allow for better reentrant implementations.

