\documentstyle[10pt]{article}

\def\B{B-Tree}
\def\C{catalog \B}
\def\E{extents overflow \B}



\def\tm{$^{\scriptscriptstyle\tt TM}$}

\setlength{\topmargin}{-0.3in}
\setlength{\textheight}{8.5in}
\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\textwidth}{6.5in}

\title{MacFS -- A Portable Macintosh Filesystem for Mach \\
         -- Extended Abstract --}
\author{Peter A. Dinda  \and 
        George C. Necula \and 
        Morgan Price \\
 \\
School of Computer Science, Carnegie Mellon University \\
Pittsburgh, PA 15213-3891, U.S.A.
}

\begin{document}

\maketitle

\begin{abstract}


Driven by the need to transparently transfer data between Macintoshes
and Mach workstations, we have written a Macintosh filesystem for Mach
3.0. In addition, we have written a filesystem library which is
portable to a variety of operating systems and platforms.  We have
discovered that the Macintosh filesystem design is not well suited to
reentrancy and its complex data structures lead to slow
implementations in multiprogrammed environments. We also discuss the
difficulties in mapping Unix semantics into Macintosh filesystem
operations.  Our performance measurements show that our implementation
performs reads and writes only slightly slower than FFS, but FFS can
perform create and delete operations far faster.  Finally, we note
that the native Macintosh implementation can perform large read and write
operations far faster that either our implementation or FFS.  

\end{abstract}

 
\section{Introduction}
 
\subsection{Why a Macintosh Filesystem For Mach?}
 
Given the vast number of Apple Macintosh machines, transparent file
interchangeability between Macs and other computers has become
necessary.  However, although there has been some development of
utilities to access Mac filesystems from MS-DOS
\cite{MACETTE,MACSEE}, there has been surprisingly little work on making Mac
filesystems accessible from Unix, much less making them mountable. We
are aware of no existing mountable Mac filesystem for workstations.
Even systems such as Mac Mach and Apple's own A/UX, which provide
Macintosh emulation, do not provide this feature \cite{MACMACH,AUX}.
 
Although Unix Appletalk client and server software exists \cite{CAP},
we feel that ``sneaker-net,'' using both Mac high-density floppies and external
SCSI hard disks, will remain commonplace.
 
Existing utilities to access Mac filesystems often destroy or fail to create
the per-file information which the Mac OS uses to simplify user
access to files. This results in the ``foreign file syndrome,'' where
files without this information require special handling and may
not be accessible to certain programs. Further, existing utilities
either ignore the resource portion of Mac files or destroy it,
wreaking havoc when those files are moved back to a Mac.  Finally,
today's Macintosh emulators, such as Executor \cite{EXECUTOR}, require
dedicating partitions or entire disks to the Mac filesystem, which wastes
space and is inconvenient.
 
 
\subsection{Goals}
 
MacFS is a portable Macintosh Hierarchical Filesystem (MHFS) that
exports a complete programming interface.  We used this interface to
build a portable user level library for accessing Macintosh volumes
and a Unix VFS \cite{VNODE}, which allows Macintosh volumes to be
mounted as if they were native Unix volumes.  The Unix VFS is run
under Mach 3.0's BSD 4.3 Unix server.  The MacFS interface is
sufficient to build installable filesystems for operating systems that
support foreign filesystems.

Macintosh files have two sections, a data fork, which is akin to the
``stream of bytes'' definition of a Unix file, and a resource fork,
which is discussed in section \ref{description}.  Our first goal was
to permit access to the data forks of Macintosh files, while keeping
the resource fork constant. The user level library meets this goal by
exporting a set of calls that mimic the Unix I/O interface, while our
VFS directly supports standard Unix I/O calls.
 
Our second goal was to avoid the ``foreign file syndrome'' that often
occurs with when moving foreign files to the Mac or when moving a Mac
file through a foreign filesystem and back to the Mac.  We met this
goal by maintaining Finder information (see section \ref{description})
on existing files.  For files created under MacFS, we supply
appropriate default Finder information and provide a mechanism for
changing it.

 

\section{The Macintosh Hierarchical File System}
\label{description}

  All Macintosh storage devices except floppy disks are partitioned
into one or more volumes.  Volumes can contain four kinds of items:
files, directories, directory threads, and file threads. Each item is
described by a {\em catalog record}, which is analogous to a Unix
inode.  Catalog records are organized in the on-disk {\em catalog
B-Tree}.  Directory contents are derived from searching the catalog
B-Tree.  The catalog B-Tree data structure has some optimizations that
make stating all the files in a directory fast.  Only a file can
occupy space outside of its catalog record.

  A Macintosh ``file'' contains two components, or {\em forks}.  The
{\em resource fork} is an indexed file containing items such as code
segments, menu items and dialog boxes.  The {\em data fork} has the
``stream of bytes'' semantics of a Unix file's contents.  When we
refer to a file, we mean the data fork portion of the ``file''.

  Each fork is comprised of one or more {\em extents} or contiguous
runs of blocks. An {\em extent descriptor} encodes an extent's
starting block and length into a 32 bit quantity. The first {\em
extent record} (three extent descriptors) of each fork is a part of
the file's catalog record. Any further extent records are kept in the
{\em extents overflow B-Tree}.

  In addition to file and B-Tree extents, a volume also contains two
boot blocks, a volume information block, and a free-space bitmap.

Several catalog record fields are reserved for the exclusive use of
{\em Finder}, a program which handles user access to the file system
and automatically maintains associations between applications and data
files.  Thus, MacFS must also mantain this Finder information.

\section{MacFS Software Structure}

  We decided early on that MacFS should be easily portable to
different operating systems and hardware. This resulted in a design
which has only two operating system dependent modules: {\em read/write
device support}, which exports procedures for reading and writing
partition relative sectors, and reading absolute sectors from a
device; and {\em mount support}, for mounting and unmounting volumes.
We have implemented two sets of modules, one using standard Unix I/O
calls, the other using Unix kernel buffer and VFS mount operations.

  Above these modules, MacFS consists of a large amount of shared
code. Small modules include {\em partition support} and {\em volume
bitmap support}. The major shared code modules are {\em B-Tree
support}, {\em catalog support}, and {\em file read/write support}.
The mount, catalog, and file read/write support modules collectively
export a base interface, the {\em m} interface, which contains
functions to do all typical file system operations.

  On top of m , we have built two interfaces. First, the {\em mac}
API presents a Unix style interface. When MacFS is compiled into a
user level library, it is the mac API which is exported. We have
written mac-based versions of the standard Unix utilities ls, cp, cat,
cd, mkdir, rmdir, rm, and mv to demonstrate how the mac functions are
used. The mac library and the example utilities should be directly
portable to any platform which has a Unix-like I/O library.

  Second, we implemented a VFS interface on top of the {\em m}
interface so that Macintosh volumes can be directly mounted on Unix
systems as if they were Unix filesystems. We feel that the {\em m}
interface is sufficiently general that mountable filesystems for other
operating systems' foreign filesystem interfaces, such as an OS/2 IFS,
can be constructed on top of it.

\subsection{The VFS Interface}

  Besides memory management and mutual exclusion (discussed below),
most of the effort in implementing the vnode interface was the details
of emulating UNIX semantics.  DosFS \cite{DOSFS} dealt with similar
issues when implementing a MS-DOS filesystem for Mach.  For example,
the MacHFS metadata does not contain protection bits, so a volume has
one set of protection bits shared by all files, set at mount time.
There is nothing exactly like a Unix inode, so deleting a currently
open file involves renaming into a special directory.  Finally, the
vnode interface does not allow us to take advantage of the MHFS \C's
capability of listing a directory and stat-ing all of its files in one
pass.


\section{MHFS Design vs MacFS Implementation}

  The \B\ structure allows for relatively fast lookups (the tree has a
maximum of 8 levels), which leads to fast {\tt stat} operations
because, as described in section \ref{description}, all the file information
is stored in the \C.  This is particularly useful on the native
Macintosh because path traversal involves stating every file at each
directory level \footnote{For example, traversing the path {\bf a:b:c}
in either a dialog box or Finder involves opening {\bf a}, stat-ing all
its items, having the user select {\bf b}, then stating all the items
in {\bf b} \ldots}.  However, inserting and deleting filesystem
metadata can be very expensive.  There are several other issues that
arose in our implementation, described in the following sections.

\subsection{Dealing with Extents }

A large part of the B-Tree code is needed to handle extremely uncommon
cases which could have been eliminated if the MHFS design were less
flexible.  Complexity is added by the fact that each \B\ has an
internal, non-fixed size and noncontiguous allocation bitmap spread
among the regular nodes. If the number of B-Tree records were bounded,
the complete bitmap could be stored at a known location.

  The feature that is responsible for most of the extra complexity of
the B-Tree code is that both trees are files, which can dynamically
grow and are not necessarily contiguous.  The first three extent
descriptors of both B-Trees are stored in the volume information
block.  Any further extent descriptors for the \C\ are stored in the
\E.  Growing the \E\ beyond three extents and storing the additional
extent descriptors in the \E\ itself would lead to circularity and
possible deadlock.  Therefore, in our implementation we refuse to grow
the extents tree beyond three extents.

  Storing the extents of the \C\ in the \E\ can lead to very expensive
metadata accesses even in the case of a simple lookup. To alleviate the
problem we currently cache the extent descriptors.


\subsection{Making MacFS Re-entrant}

  One particular challenge is implementing a re-entrant Macintosh file
system. We note that Apple's implementation is not reentrant.

  The ideal synchronization algorithm would be fine-grained -- at the
\B\ node level -- assuring that only conflicting threads are blocked.
Such an algorithm would be very difficult to implement due to the fact
that one cannot figure out at the beginning of an operation which
nodes along the path will be altered during the operation. In the case
when the root itself will be modified, all the concurrent operations
on that \B\ must be locked.

  Our conclusion is that the designers of the Macintosh file system
considered the flexibility of the file system data structures the most
important issue. We feel that with simpler and possibly fixed size and
position data structures the file system would perform better and
would also allow for better reentrant implementations.



\section{Evaluation of Implementation}

  To evaluate MacFS, the issues of functionality and performance
must be addressed.  The user-level tests involved both floppies and a
240 MB SCSI disk; the VFS-level tests involved only the SCSI disk.

\subsection{Functionality testing}

  We used the Macintosh Finder and the Macintosh Norton Utilites 2.0
to test that disks modified by our user-level and VFS-level MacFS
implementations remain Macintosh compatible.  Although we are able to
copy any type of file, all new files created have the Finder Info of
text files.

\subsection{Performance}

  To test our performance, we created five simple benchmarks.

\begin{itemize}
\item Create: Create a large number of files.

\item Delete: Delete a large number of files.

\item Write: Create, write, and close one 5000000-byte file with
request sizes of 500, 5000, and 50000 bytes.

\item Overwrite: Open, Overwrite and close one 5000000-byte file using the same
range of block sizes.

\item Read: Read one 5000000-byte file using the same range of block sizes.

\end{itemize}

  All reads and writes are sequential.  The benchmarks were run on a
240 MB Hitachi DK312C disk on a DECstation 5000/120 with a 24 MHz MIPS
R3000 CPU and on the same disk attached to a Macintosh 145B wtih a 25
MHz MC68030 CPU.  On the Macintosh, the benchmarks were run using the
native OS and filesystem, with APS SCSI driver 2.7.  On the
workstation, we ran the benchmarks using the user-level calls and with
Unix calls on a mounted filesystem.  All of these tests used a 4K
logical block size (the size produced by initializing the disk on a
Macintosh).  Finally, for a comparison with standard workstation
filesystems, we produced an FFS volume with a 4K logical block size on
the same disk and ran the same benchmarks.

  In table \ref{Performance} we give the measured performance figures.  The
numbers for the Create test is in files per second; the numbers
for the Write, Overwrite, and Read test are in bytes per second.  The
difference between the Write and Overwrite tests is that the Write
test includes overhead for growing the size of a file. 

\begin{table}
\begin{center}
\begin{tabular}[h]{|l|r|r||r|r|r||r|r|r||r|r|r|}
\hline
& \multicolumn{1}{c|}{}
& \multicolumn{1}{c||}{}
& \multicolumn{3}{c||}{500 bytes/request}
& \multicolumn{3}{c||}{5000 bytes/request}
& \multicolumn{3}{c|}{50000 bytes/request} \\
\hline Platform & Create & Delete
& Write & Over. & Read
& Write & Over. & Read
& Write & Over. & Read \\
\hline Macintosh & 7.5 & -
& 26.5  & 27.0  & 28.7
& 118.2  & 119.2  & 124.2
& 527.0 & 532.9  & 548.9 \\
\hline MacFS User & 24.6 & 23.2
 & 83.0  & 71.1 & 173.6
 & 147.8  & 182.4 & 303.8
 & 211.8  & 169.7 & 303.2 \\
\hline MacFS VFS & 42.4 & 34.0
& 21.1  & - & -
& 24.0  & - & -
& 28.9  & 27.0  & 85.2  \\
\hline Unix FFS & 111.5 & 571.7
& 265.7  & 150.0 & 254.7
& 373.7  & 139.7 & 406.2
& 410.1  & 388.8  & 398.7  \\
\hline  & files/sec & files/sec
 & KB/s  & KB/s & KB/s
 & KB/s  & KB/s & KB/s
 & KB/s & KB/s & KB/s \\
\hline
\end{tabular}
\end{center}
\caption{MacFS Performance}
\label{Performance}
\end{table}

%
% This is the data for the rodime disk
%
%%\begin{table}
%\begin{center}
%\begin{tabular}[h]{|l|r||r|r|r||r|r|r||r|r|r|}
%\hline
%& \multicolumn{1}{c||}{}
%& \multicolumn{3}{c||}{500 bytes/request}
%& \multicolumn{3}{c||}{5000 bytes/request}
%& \multicolumn{3}{c|}{50000 bytes/request} \\
%\hline Platform & Create
%& Write & Over. & Read
%& Write & Over. & Read
%& Write & Over. & Read \\
%\hline Macintosh & 7.5?
%& 27.2  & 27.6  & 150.2 
%& 172.1  & 172.9  & 443.1 
%& 546.4  & 561.8  & 842.7  \\
%\hline Unix FFS & 230?
%& 254.4  & - & -
%& 442.7  & - & -
%& 344.7  & 355.7  & 292.0  \\
%\hline Unix MacFS & ?
%& 21.1  & - & -
%& 24.0  & - & -
%& 28.9  & 27.0  & 85.2  \\
%\hline MacFS User level & 46.7?
% & 83.4  & - & -
% & 203.0  & - & -
% & 322.9  & - & - \\
%\hline  & files/sec
% & KB/s  & KB/s & KB/s
% & KB/s  & KB/s & KB/s
% & KB/s & KB/s & KB/s \\
%\hline
%\end{tabular}
%\end{center}
%\caption{MacFS Performance}
%\label{Performance}
%\end{table}

  The VFS implementation described is preliminary, and as yet we
cannot explain the performance disparity between our in-kernel and
user-level implementations.

  Also surprising is the poor relative performance of the Macintosh
compared to our user-level system in creating files. This may relate
to initializing extra data for the Finder.  The FFS is much faster for
these tasks, perhaps because of \C overhead.

  The fact that read-write speeds do not increase monotonically with
request size is surprising.  Our user-level implementation compares
surprisingly well to the Macintosh for 500 and 5000-bytes requests,
and the Macintosh performs even better the FFS for 50000-byte
requests.

\section{Conclusions}

 
 The Macintosh filesystem stores the metadata in complex data
structures making fine-grained synchronization and thus reentrancy
difficult to implement efficiently. Furthermore, Unix semantics do
not allow the user to take advantage of the Macintosh's fast combined
directory listing and stat-ing, or the convenience of the resource
fork.

 We think that these complex data structures and the fact that our
implementation has to deal with endianity and alignment are the most
important overheads when creating and deleting small files.

\begin{thebibliography}{MacMach}




\bibitem[Mac-Ette]{MACETTE} {\em MAC-ETTE}. A shareware 
   MS-DOS program for accessing MAC high-density floppies, Paul E.
   Thomson, 1992.

\bibitem[MacSee]{MACSEE} {\em MacSee 3.1}. A shareware MS Windows 
   program for reading and writing MAC HD floppies, Mac format CD ROMs,
   and Mac-formatted Syquest cartridges, REEVEsoft, 1993.

\bibitem[MacMach]{MACMACH} Several variants of Mach for the Macintosh exist,
   including the CMU project, and Mach Ten.

\bibitem[A/UX]{AUX} The Macintosh A/UX Operating System Release 3.0. 
   In {\em IEEE Computer}, 26(2):103--106, February 1993.

\bibitem[CAP]{CAP} {\em Columbia Appletalk Protocols for Unix}.  A
   freely dis\-tri\-butable Appleshare server for Unix, Columbia
University of New York.

\bibitem[Executor]{EXECUTOR} Executor. In {\em MacWeek}, September 24,
   1991, p16.  

\bibitem[IMF]{IM4} {\em Inside Macintosh: Files}, Apple Computer, 
   1992, p23--76.

\bibitem[DosFs]{DOSFS} A. Forin and G. Malan. {\em An MS-DOS
   Filesystem for Unix}. To appear in {\em Proceedings of the
   Winter 1994 USENIX Conference}, January 1994.   

\bibitem[Vnode]{VNODE} S.R. Kleiman. {\em Vnodes: An Architecture for
Multiple File System Types in Sun Unix} in {\em USENIX Association:
Summer Conference Proceedings}, Atlanta, 1986.

\end{thebibliography}

\end{document}
