File Systems and FUSE

File systems provide an abstraction, both to the user of a computer system and to the programmer. They present a uniform hierarchical view of data, even though this data may actually be spread across various areas of the disk, or may even be spread across multiple disks or multiple computer systems.

A file system is composed of a set of files and directories. Files store the actual data while directories are lists of files and other directories. If a directory A contains another directory B, we say that B is a subdirectory of A. A file (or directory) is identified by its path. A path is a sequence of strings separated by "/". The last string may be a file or directory, while the other strings must all be directories. An example of a path is "/usr/include/stdio.h".

Two main difficulties in writing a file system.

  1. large, complex API
  2. file system lives in kernel space (hard to debug, easy to crash the machine)

FUSE (file system in user space) fixes #1 by providing a simpler, more uniform API. For example, in FUSE all operations take a full, absolute path (a path is absolute if it starts with "/"). There is no notion of relative paths. FUSE fixes #2 by running your file system code in user space rather than in kernel space.

Making a call into a FUSE file system

1. A program, such as ls, mkdir, or perflab makes a call to a file system routine. For example, open("/test/fuse/file1.txt"). This call gets sent to the kernel.

2. If this file is in a FUSE volume, the kernel passes it on to the FUSE kernel module, which then passes it on to the implementation of that file system (this is the portion we will be writing in this lab).

3. The implementation of open then refers to the actual data structures that represent the file system and returns a file handle. It is open's job to take a concrete view of data (bits stored on a hard drive) and present an abstract view (a hierarchically organized file system).

4. The kernel returns the result of the open function to the program that originally made the call.

FUSE API

filedirectory
createmknodmkdir
removeunlinkrmdir
readreadreaddir
writewrite
miscopen, truncate

Also, there is a "getattr" function, which applies to both files and directories. getattr returns various statistics, such as file size, or the number of files in a directory.

FUSE Lab Data Structure

For the FUSE lab, we will be implementing a file system that stores its contents in memory. We will use a dictionary data structure to represent the contents of the filesystem. Dictionaries map keys to entries and have the property that keys are unique. That is, there can never be two entries associated with the same key.

We will use dictionaries to store the contents of directories. Suppose we have a dictionary foo_dict storing the contents of directory "/foo". The keys will be strings, representing the names of the files and subdirectories of "/foo". The associated entries will depend on whether the entry represents a file or a directory. In the case of files, the dictionary entry will be the contents of the file. In the case of directories, the entry will be another dictionary storing the contents of that directory. For example, the file "/foo/login.txt" will be represented by having foo_dict map key "login.txt" to an entry representing the contents of the file login.txt. The directory "/foo/baz" would be represented by having foo_dict map key "baz" to dictionary baz_dict, which contains the contents of "/foo/baz".

Since entries can store two different types of data (one type for files and one for directories), we will use a tagged union as the entry type. Note how closely this use of dictionaries mirrors our view of a file system. Dictionaries can contain entries which are themselves dictionaries just as directories can contain subdirectories.

This represents the filesystem