Carnegie Mellon
SCS logo
Computer Science Department
home
syllabus
staff
schedule
lecture
projects
homeworks
 
 

15-410 Project 4


Overview

This semester Project 4 will explore the issue of how functional a kernel needs to be. When Unix was young, each operating system "naturally" had its own, single and unique, executable-file format, so it made sense for the program loader to reside in the kernel. In addition, having exec() in the kernel enabled the setuid invention, which allows the author of a program to arrange for it to run with the privileges of the author as well as those of the invoker.

Times have changed. For one thing, many executable files rely on dynamically-linked shared-object libraries; it would be difficult and unwise to embed this large, unstable, and rapidly-changing code base in the kernel. In addition, many Unix platforms have some ability to emulate the execution of programs built for other platforms (for example, FreeBSD can run many Linux programs--at times, faster than Linux does). This is plausible in good part because the shared-object loaders for the various platforms are user-space programs, so supporting multiple formats doesn't explode the size of exec() in the kernel.

Also, the utility of setuid programs is on the wane. Because they are difficult to write correctly, they are a common source of security exploits. In Unix their need has been reduced by sendmsg() and recvmsg(), which enable the transfer of file descriptors from one process to another. Also, the rise of the Web means that many program executions are carried out on behalf of users not part of the kernel's permission system. One Unix descendant, Plan 9, has done away with setuid executables entirely.

These trends call into question the idea of executable file formats being defined and interpreted by the kernel. What would it take to do without an exec() system call? Could a library function do the job?

In Project 4 you will implement a stripped-down version of the mmap() system call and a user-space library providing the functionality of exec(). The necessary address-space manipulations will be performed by a new system call you will write named stirfry().

Project 4 is due Wednesday, December 6th, at 23:59. When planning your work, keep in mind that the book report and the final homework assignment will be due on the last day of classes.

Note that P4 grades will probably not be returned before the final exam; in the other direction, the exam will not test you on P4 material as such (for that reason and because not all groups will work on P4).

Memory-mapped files

To begin with we will ask you to implement memory-mapped files via one system call.
  • int mmap(char *pathname, void *base)

    Causes the RAM disk file named by pathname to appear in the invoking task's address space starting at the address specified by base, which must be page-aligned. If the call is successful the return code is the number of bytes contained in the file and memory reads from the appropriate range of pages will return "file" data from the RAM disk.

    The system call returns an error code less than zero if there is no file by that name, if the base address is invalid, if the kernel is running low on some critical resource necessary for the call to succeed, if it is not possible to contiguously map the entire file contents into the task's address space starting at the base address, etc. The mmap() specification does not include specific values for the various error conditions, which may vary from one kernel implementation to another. Callers of mmap() should do "due diligence" but need not go to genuinely extreme lengths to figure out the cause of every error.

    The task may use the remove_pages() system call to remove the mmap()'d pages from its address space

The stirfry() System Call

typedef struct stirfry_region {
    void *current;
    void *destination;
    unsigned int flags; /* STIRFRY_WRITE: writable */
} stirfry_region_t;

int stirfry(unsigned int rcount, stirfry_region_t rlist[], stirfry_region_t *stack, void *eip, void *esp);

The stirfry() system call rearranges the address space of the calling task as indicated by the parameters. Upon invocation, the current field of each region structure must specify the base address of a memory region obtained via new_pages() or mmap(). If the call is successful, all memory not mentioned by one of the region parameters of stirfry() will be deallocated, and each region will appear in the new address space at the address specified by the region's destination field.

The system call does not return to the invoking task in the usual way. Instead, %eip and %esp are set to the specified values. The values of other registers are undefined.

The rlist argument is an array of rcount stirfry_region structs. Implementations may limit the total size of the region list to a reasonable value.

The stack argument is a region much like the ones in rlist, except that after the call it will behave as the "automatic stack" region (it will grow downward, automatically allocating pages as necessary).

A call to stirfry() will fail, returning an error code less than zero, if the resulting arrangement of regions into a new address space would be invalid for any reason. Implementations may return an error code less than zero if more than one thread exists in the invoking thread's task.

exec()

Based on your kernel implementations of mmap() and stirfry(), you will re-implement exec() as a library routine in a new user-space library called libexec.a.

Discussion

Zealotry

The ideal version of this project would include removing all knowledge of the ELF executable file format from your kernel. This would require the definition of a simpler, primeval binary format and the construction of an init program, containing your exec() library, in this primeval executable format.

This degree of rigor is not required for this project. However, we would like to see you remove the exec() system call entry point from your kernel. Look, Ma, no hands!

Acceptable Practices

If you feel yourself wanting to cut and paste some body of code from 410kern, you are authorized to do so.

You may observe that when exec() is a library routine rather than a system call certain abusive invocations may result in thread death as opposed to error returns. This is acceptable for the purposes of this project. However, if you wish to address this issue through the definition and implementation of a tasteful system call, you may (you will find that some system call numbers have been reserved for your use).

Deliverables

  1. Your modified source tree, in p4, including your modified kernel and library routines.
  2. A config.mk which has been modified to include EXEC_OBJS so that your exec library builds correctly.
  3. Include a discussion of your design and implementation in README.dox. Be sure to discuss the key design decisions you made.

Note that your libexec implementation should work on any Pebbles kernel extended with mmap() and stirfry().

Clarifications

  1. You may assume that your libexec's exec() will be invoked only by single-threaded programs, as that will remove the need for you to solve a frustrating corner case or two.

[Last modified Monday November 27, 2006]