\begin{figure}
\centerline{\epsfxsize=5in \epsfbox{swstruct.eps}}
\caption{MacFS Software Structure - The shaded modules contain
         platform-specific code}
\label{swstruct}
\end{figure}


\section{MacFS Software Structure}
\label{implementation}

  We decided early on that MacFS should be easily portable to
different operating systems and hardware. This resulted in a design
which has only two platform-dependent module, {\em read/write device
support}, which exports procedures for reading and writing partition
relative sectors and reading absolute sectors from a device, and {\em
mount support}, which has the concept of a platform-dependent device
name.  We have implemented two sets of these modules, one using Unix
I/O calls (open/read/write/close) and device naming (/dev/...), the
other using Unix kernel buffer operations and internal kernel device
names.

  Above the read/write device support module, MacFS consists of a
large amount of shared code. Small modules include {\em partition
support} and {\em volume bitmap support}. The major shared code
modules are {\em B-Tree support}, {\em catalog support}, and {\em file
read/write support}.  The mount, catalog, and file read/write support
modules collectively export a base interface, the {\em m} interface,
which contains functions to do all typical file system operations in a
non platform-specific way.

  On top of this, we have built two interfaces. First, the {\em mac}
API is an interface with Unix-style file naming and manipulation
functions.  When MacFS is compiled into a user level library, it is
the mac API which is exported. We have written mac-based versions of
the standard Unix utilities ls, cp, cat, cd, mkdir, rmdir, rm, and mv
to demonstrate how the mac functions are used. The mac library and the
example utilities should be directly portable to any platform which
has a Unix-like I/O library.

  Second, we implemented an experimental VFS interface on top of the m
interface so that Macintosh volumes can be directly mounted on Unix
systems as if they were Unix file systems. We feel that the m interface
is sufficiently general that mountable file systems for other operating
systems' foreign file system interfaces, such as OS/2 IFS, can be
constructed on top of it.

The MacFS software structure is graphically summarized in figure
\ref{swstruct}.

\subsection{Common Data Types}

  Throughout MacFS, a {\em mount record} represents a volume. In our
initial design, it only contained basic information such as partition
offsets, but we soon discovered that converting between unaligned, big
endian data on a Macintosh disk and aligned, possibly little endian
data in memory in order to change such frequently accessed structures
such as the volume information block consumed far too much time.
Therefore we now keep the volume information block, the free space
bitmap, the headers for the extents and catalog B-Trees, and an extent
record cache for the catalog B-Tree in a volume's mount record.

  A {\em catalog record} represents a file, directory, or thread.  It
has the same fields as the on-disk catalog record, but adds lock,
reference count, and validity fields.  Opening a file instantiates its
catalog record in memory.  The in-memory catalog record is then used
for all read/write operations on the file and is finally flushed back
to the disk on a close call.

  Reading, writing, growing, and truncating a file builds an in-memory
{\em extent descriptor list} that holds the extent descriptors
necessary for the request.

\subsection{Read/Write Device, Partition and Mount Support}

  An early design choice for the read/write device support module was
to support multiple sector reads and writes.  With the late addition
of partitioning support, these functions were modified so all sector
requests were made partition relative.  Further, a read absolute
sector function was added for the use of the partition support module.
Finally, an unbuffered write function, used at sync time, was added.

  The partition support module simply sets the offset necessary to
make all reads and writes partition relative.  Surprisingly, a
Macintosh device can have a wide variety and number of partitions,
including device driver partitions.

  The mount support module provides mount, unmount, and sync
functionality.  Mounting a device consists of opening the device,
setting the partition offset, and reading in the volume information
block, free space bitmap, and the headers for the extents and catalog
B-Trees. Syncing writes back those data structures using unbuffered
writes.  Unmounting syncs, closes the device, and marks the mount
record as invalid.

% btree support
\input{btree}

\subsection{Catalog Support}

  The catalog support module exports functions for: creating and
deleting files and directories, moving and renaming files and
directories, getting a list of the contents of a directory, opening
and closing files and directories, and changing Finder information on
open files.  Rollback is used to recover from errors in all functions. 

\subsubsection{Naming and Catalog Records}

In the catalog support module, files and directories are identified by
the volume's mount record, the catalog record of their parent
directory, and their own name. For example, creating a file requires a
mount record, the catalog record of the directory the file will be
placed in, and the name for the file.  The catalog record of the newly
created file is returned.

Extending the naming conventions of MHFS (section \ref{description})
by using catalog records instead of IDs was necessary to simplify the
VFS level implementation of MacFS and should make writing installable
file systems for other operating systems simpler.  Catalog records are
the closest MHFS analogy to a Unix inode.  For some m functions, a
catalog ID must be used because the parent of the root folder has no
catalog record, yet it must be possible to open, close, and list the
contents of the root folder.

\subsubsection{Directory Contents}

A function which returns a list of the names and catalog records of
all the items in a directory is provided. This is a necessary support
function because, unlike the Unix file system, MHFS directories are not
files that contain lists of subitems. Altough we support ``opening''
directories, the semantics of opening a directory is to return its
catalog record, which contains no information about the contents.  To
obtain a directory list, repeated catalog B-Tree searches are done
with descending filenames until the directory's thread record is
found.

\subsection{File Read/Write Support}

Files are identified in the file read/write support module by their
catalog record and the mount record of the volume they are on. This is
all that is necessary because the functions in this module only modify
the in-memory catalog record of an open file and the extents B-Tree.
The module provides functions to read and write files and to truncate
the length of files.

\subsubsection{Read/Write Requests and Extent Descriptor Lists}

Any read or write request first generates an extent descriptor list 
enumerating the extents necessary to complete the request. For example, 
suppose a file has two extents of 512 byte blocks, the first, 10(2) 
starting at block 10 and extending for two blocks, and the second, 50(10) 
starting at block 50 and extending for 10 blocks. A read request for 1000 
bytes starting at byte 1000 will result in the extent descriptor list 
$\{11(1),50(2)\}$ being ``knitted.'' The ``block extents'' represented by the 
elements of an extent descriptor list are translated to ``sector extents'' 
for the ultimate calls to {\em readsects()}. 

\subsubsection{Growing a File}

When a file is created, a single extent of the volume's default
``clump size'' is allocated for it. When a given write will extend
beyond current physical length of a file, {\em mgrowfile()} is called
to expand the file.  First, a search and update of the volume
free-space bitmap is performed, building an extent descriptor list
detailing the newly allocated space. The search starts at a per-volume
``start of next allocation search'' kept in the volume's information
block. Then, the new extent descriptors are incorporated into the
file's extent records. Any open slots in the catalog record's extent
record are first filled. This is followed by a foray into the extents
overflow B-tree, updating and/or adding new extent records for the
file.

In all cases, new extent descriptors are coalesced into existing
extent descriptors, if possible.  For example, suppose a file has a
single extent record $\{10(2), 50(10), 0(0)\}$, and a bitmap search
resulted in the extent descriptor list $\{60(20), 100(1), 110(10)\}$.
mgrowfile() will update the extent record to $\{10(2), 50(30),
100(1)\}$ and add a second extent record, $\{110(10), 0(0), 0(0)\}$ to
the extents B-Tree.  Extent coalescing is instrumental to fast file
access in MacFS.

We feel that extent coalescing could be done more often if each file 
catalog record had a ``start of next allocation search'' field. The 
per-volume field seems to be a holdover from when the Macintosh was a 
one-application-at-a-time, one-document-at-a-time machine. Even in 
simple situations where two files are being grown in a interleaved 
fashion, extent coalescing becomes very unlikely on the Macintosh and in 
the current implementation of MacFS.  
