15-410 Project 4
Overview
This semester Project 4 will explore the issue of
how functional a kernel needs to be. When Unix was
young, each operating system "naturally" had its own,
single and unique, executable-file format, so it
made sense for
the program loader to reside in the kernel. In
addition, having exec() in the kernel
enabled the
setuid
invention,
which allows
the author of a program to arrange for it to run with
the privileges of the author as well as those of the
invoker.
Times have changed. For one thing, many executable
files rely on dynamically-linked shared-object libraries;
it would be difficult and unwise to embed this large,
unstable, and rapidly-changing code base in the kernel.
In addition, many Unix platforms have some ability to
emulate the execution of programs built for other
platforms (for example,
FreeBSD
can run many Linux
programs--at times, faster than Linux does). This is
plausible in good part because the shared-object loaders
for the various platforms are user-space programs, so
supporting multiple formats doesn't explode the size of
exec() in the kernel.
Also, the utility of setuid programs is on the wane.
Because they are difficult to write correctly, they are
a common source of security exploits. In Unix their
need has been reduced by sendmsg() and
recvmsg() , which enable
the
transfer of file descriptors from one process to another.
Also, the rise of the Web means that many program
executions are carried out on behalf of users not
part of the kernel's permission system. One Unix
descendant,
Plan 9,
has done away with setuid executables entirely.
These trends call into question the idea of executable
file formats being defined and interpreted by the kernel.
What would it take to do without an exec()
system call?
Could a library function do the job?
In Project 4 you will
implement a stripped-down version of the mmap()
system call
and a user-space library providing the functionality of
exec() . The necessary address-space
manipulations will be performed by a new system call
you will write named stirfry() .
Project 4 is due Wednesday, December 6th, at 23:59.
When planning your work, keep in mind that the book
report and the final homework assignment will
be due on the last day of classes.
Note that P4 grades will probably not be returned before
the final exam; in the other direction, the exam will
not test you on P4 material as such (for that reason
and because not all groups will work on P4).
Memory-mapped files
To begin with we will ask you to implement memory-mapped
files via one system call.
-
int mmap(char *pathname, void *base)
Causes the RAM disk file named by pathname to
appear in the invoking task's address space starting
at the address specified by base , which must
be page-aligned. If the call is successful the return
code is the number of bytes contained in the file
and memory reads from the appropriate range of pages
will return "file" data from the RAM disk.
The system call returns an error code less than zero
if there is no file by that name, if the base address
is invalid, if the kernel is running low on some
critical resource necessary for the call to succeed,
if it is not possible to contiguously map
the entire file contents into the task's address space
starting at the base address, etc.
The mmap() specification does not include
specific values for the various error conditions, which
may vary from one kernel implementation to another.
Callers of mmap() should do "due diligence"
but need not go to genuinely extreme lengths to figure
out the cause of every error.
The task may use the remove_pages() system
call to remove the mmap() 'd pages from its
address space
The stirfry() System Call
typedef struct stirfry_region {
void *current;
void *destination;
unsigned int flags; /* STIRFRY_WRITE: writable */
} stirfry_region_t;
int stirfry(unsigned int rcount, stirfry_region_t rlist[], stirfry_region_t *stack, void *eip, void *esp);
The stirfry() system call rearranges the address space
of the calling task as indicated by the parameters. Upon invocation,
the current field of each region structure must specify
the base address of a memory region obtained via new_pages()
or mmap() . If the call is successful, all memory not
mentioned by one of the region parameters of stirfry()
will be deallocated, and each region will appear in the new address
space at the address specified by the region's destination
field.
The system call does not return to the invoking task in the usual
way. Instead, %eip and %esp are set to the
specified values. The values of other registers are undefined.
The rlist argument is an array of rcount
stirfry_region structs.
Implementations may limit the total size
of the region list to a reasonable value.
The stack argument is a region much like the ones in
rlist , except that after the call it will behave as
the "automatic stack" region (it will grow downward, automatically
allocating pages as necessary).
A call to stirfry() will fail, returning an error code
less than zero, if the resulting arrangement of regions into a new
address space would be invalid for any reason.
Implementations may return an error code less than zero if more than
one thread exists in the invoking thread's task.
exec()
Based on your kernel implementations of mmap()
and stirfry() , you will re-implement exec()
as a library routine in a new user-space library called
libexec.a .
Discussion
Zealotry
The ideal version of this project would include removing all
knowledge of the ELF executable file format from your kernel.
This would require the definition of a simpler, primeval binary
format and the construction of an init program,
containing your exec() library, in this primeval
executable format.
This degree of rigor is not required for this project.
However, we would like to see you remove the
exec() system call entry point from your
kernel. Look, Ma, no hands!
Acceptable Practices
If you feel yourself wanting to cut and paste some body
of code from 410kern ,
you are authorized to do so.
You may observe that when exec() is a library
routine rather than a system call certain abusive invocations
may result in thread death as opposed to error returns.
This is acceptable for the purposes of this project.
However, if you wish to address this issue through the
definition and implementation of a tasteful
system call, you may (you will find that some system call
numbers have been reserved for your use).
Deliverables
- Your modified source tree, in p4,
including your modified kernel
and library routines.
- A config.mk which has been modified
to include
EXEC_OBJS
so that your exec library builds correctly.
- Include a discussion of your design and
implementation in README.dox.
Be sure to discuss the key design decisions
you made.
Note that your libexec implementation should
work on any Pebbles kernel extended with mmap()
and stirfry() .
Clarifications
- You may assume that your
libexec 's
exec() will be invoked only by
single-threaded programs, as that will remove the
need for you to solve a frustrating corner case
or two.
|