Carnegie Mellon
SCS logo
Computer Science Department
home
syllabus
staff
schedule
lecture
projects
homeworks
 
 

15-410 Project 0: Traceback


Table of Contents

Project Overview

In this project you will be writing a "library" which contains a single function called traceback(). traceback() prints out a stack trace of the program it is called from. The stack trace will include all of the function calls made to reach the current location in the program. You will be provided with information about the functions available in the program and their arguments.

One example of a possible use for such a function would be to call it from a segmentation fault handler to help debug the program.

Traceback Details

The prototype for trackback, as defined in traceback.h, is

void traceback(FILE *);

The argument to traceback is the file stream to which the stack trace should be printed. For most programs, this will probably be stderr, but taking it as an argument allows for greater flexibility in the use of traceback.

Defined in traceback_globals.c is a table containing an entry for each function in the program (to the extent feasible: in some corner cases, some executable code may lack a function-table entry). Each entry in the function table has the type functsym_t (defined in traceback_internal.h), which contains the name of the function and the address at which the function begins along with a list of arguments. Each argument is defined as an argsym_t containing the argument type and name of the argument. The type is stored as an integer and can be matched with the definitions in traceback_internal.h. For the sake of simplicity, we are requiring you to recognize only char, int, float, double, char*, char**, and void*. All subsequent references to 'string' in this document refer to C-style character strings.

If the function list contains fewer than FUNCTS_MAX_NUM entries it will be terminated by a function with a zero-length name. Similarly, if the argument list for a function contains fewer than ARGS_MAX_NUM arguments it will be terminated by an argument with zero length name. The functions in the list are sorted by address.

For each function invocation you should print the name of the function and all of the arguments. When printing each argument you should output the name and the actual argument whenever the type is known. This means you must print the string in the case of a char* and some (see below) of the strings in the case of a char**. Be warned that traceback() must not cause a program calling it to terminate due to a segmentation fault. If the type of an argument is not known, we do not expect you to print the value (see below).

For those of you wondering how you can have a global table containing a program's function names and argument types, this is not normally possible within the the C language framework. Each test program linked against the traceback library will obtain the code for your traceback() function and a blank function table. After the program is built, a Python script will decode the object file and modify it so that the table slots are filled in with the correct information (see the lecture notes for a diagram). This is not really the correct way to obtain this information; one should obtain it at runtime by having a long and complicated conversation with a large confusing library which understands how to parse executable files. The correct approach, however, is significantly more work than intended for this project and does not really add to the learning experience as it is just an exercise in jumping through hoops (hoops which not only are on fire but also move around and/or shrink unpredictably).

Formatting

Note that this description is organized in terms of various concerns or goals, not as a code outline. If you find yourself needing to consider the parts of this description several times and to carefully consider how varous goals and concerns interact, that does not mean you are doing something wrong.

traceback() should output the function invocations in order from the last (most recent) function called to the first function called. The output should contain the names and values of all of the arguments (and "void" if there are no arguments). The output of traceback() should match the following sample partial output:

Function foo(int i=5, float f=35.000000), in
Function foobar(char c='k', char *str="test", char *unprintable=0xffff0000), in
Function bar(void), in

This indicates that some function (not shown) called bar() with no arguments. bar() then called foobar() with a character 'k', a string "test", and a string called unprintable, located at 0xffff0000 in memory, which traceback() was unable to print. foobar() in turn called foo() with the arguments 5 and 35, and then foo() invoked traceback().

If you determine that a function invocation does not conform to the calling conventions at all (for example, the value for its stack pointer could not possibly be a stack frame), traceback should terminate. If you wish, you may emit a single line, beginning with FATAL: if your implementation runs into a situation which qualifies as a fatal error. Note that this does not cover the case of wild pointers or other 'illegal' values in a function's arguments: perfectly legal programs can pass wild pointers around without violating calling conventions.

If a stack frame is generally well-formed but the return address (say 0x20002ab0) indicates a call site within a function which is not in the functions table, you should print a line of the form:

Function 0x20002ab0(...), in

In this case, you should keep tracing the stack frames after this function if possible.

All arguments are printed as "type name=value", but the following special rules should also be applied:

  • For this assignment, 'printable' characters are those for which the standard library isprint() function returns true (see ctype.h). A string that contains an unprintable character is considered an unprintable string. (Code or comments containing an unprintable word is considered immature but at times understandable, but that doesn't affect the output of your traceback() implementation.)

  • Chars should be printed between single quotes if printable. If not, the chars should be printed (still within single quotes) as escaped octal characters: for example, if an argument c contains the ASCII 'ACK' character, the argument should be printed as follows: char c='\6' . This applies to unprintable characters only - see below for what to do with unprintable strings.

  • Integers and floating-point numbers should be printed in base 10. The default behavior of printf() is acceptable for floats and doubles, both in terms of number of digits printed and in terms of what is printed for unusual floating point values (NaN, plus or minus infinity).

  • void*'s should be printed in hexadecimal, except that instead of "0x" the hexadecimal value should be prefixed with "0v".

  • Strings should be printed between double quotes.

  • String arrays are generally displayed in this format: {"string1", "string2", "string3"}. The quotation marks are to be added around each string by traceback(); they are not (generally) part of the string.

    Printing of char** arrays is heuristic only. Because there is no way to determine the number of elements in an array given a pointer to the base of the array (or, even worse, a pointer into the middle of the array), there is no way to in every situation print all the "valid strings" in a string array without printing "non-valid strings" too--unless some special private contract is in place, such as the int argc parameter of main(). Thus we will provide you with some rules to use for this assignment; of necessity they are somewhat arbitrary.

    If a string in the array is not printable, the address of that string should be printed in its place. However, because it matches many common usages, you should treat a null (zero) pointer in a string array not as an "unprintable string" but as an indication that the array has come to an end before the null pointer. In the other direction, if it appears that a string array contains four or more strings, only the first three should be printed, followed by "...". For example, {"string1", "string2", "string3", "string4"} should be printed as {"string1", "string2", "string3", ...}. Unprintable strings count toward the number of elements in the array (i.e., you need look at only the first four elements no matter what).

  • If a printable string has more than 25 characters, only the first 25 should be printed, followed by a "..." (e.g., "this string has more than 25 characters" should be printed as "this string has more than...").

  • Anything that cannot have its value printed for any reason should have its address printed in hex, except if it is a valid char value that happens to contain a single unprintable character. If part of a char* string is printable and any part is not, then the entire string is considered to be unprintable. A string array with one or more unprintable strings within it is still considered printable itself as long as the string array is itself a valid array.

  • Anything of an unknown type should be displayed as if it had some type "UNKNOWN" and as though it were an unprintable constant, that is, with the address in hex.

  • Please note that your program will be graded by an automatic test system, so these formatting requirements are not suggestions! If you mix up parentheses versus braces, leave commas out, etc., you will probably lose half of the points for the automated test section of the assignment. Please see the README file distributed with the assignment for further information on formatting. If the specification is genuinely ambiguous about a minor formatting detail, the automated tester shouldn't care about that detail either.

Goals

Despite the fact that this is the smallest project of the five that will be assigned in this class, it is important to pay attention to the key concepts in Project 0. The ideas taught here will provide the foundation for the next four projects. In particular, we would like you to be comfortable with:

  1. Writing clean code in C. Many people like the C programming language because it gives the programmer a lot of freedom (pointers, casting, etc). It is very easy to hang yourself with this rope. Developing (and sticking with) a consistent system of variable definitions, commenting, and separation of functionality is essential.

    People have asked about using C++ in this class. Writing your kernels in C++ is probably much harder than you think, since you would need to begin by implementing your own thread-safe (or, at least, interrupt-aware) versions of new and delete. In addition, you would probably find yourself implementing other pieces of C++ runtime code; this could turn into quite a hobby. As a result, you should do this program in C as a way of re-familiarizing yourself with the language you'll be using for the remainder of the course.

  2. Writing pseudocode (or otherwise designing before coding). For systems programming, it is very important to think out crucial data structures and algorithms ahead of time since they become important primitives for the rest of the system.

  3. Commenting. Though you will not be working with a partner for the first two projects, you will be on all subsequent projects. It is important to include comments so someone else looking at or maintaining your code can quickly understand what your code is doing without having to look at its internals. For this assignment, which is a refresher, it should not be hard to comment it appropriately and you may do so in the standard fashion. However, since the remainder of the assignments will use it, we will describe below the doxygen system, similar to javadoc, but targeted at C.

  4. Using common development tools (gcc, ld).

  5. Communicating with the course staff using various channels of communication (bulletin board, staff-410 at the CS domain, Q&A archive, course web page, office hours).

Since code quality (layout, modularity, defensive programming) and readability will be so important in this class (and after you leave CMU), they will have a substantial impact on your project grades. In the case of Project 0, expect that they will determine 10-20% of your project grade. The 410 doxygen documentation points to two acceptable coding style guides.

To receive the benefits offered by this assignment (including design practice and sanity-checking whether you might be prepared to take this class this semester) you need to complete it. Because issues (and solutions) interact in a non-linear way, doing "50% of the features" probably delivers 30% of the learning, and will be graded accordingly. Said another way, in keeping with the grading philosophy outlined in the syllabus, grades of C or better are reserved for solutions that address all parts of the problem. Passing only one or two of the tests we provide you with definitely doesn't qualify.

Getting Started

To get started with the project, download the support-code tarball and extract the files contained within. You should probably study all of the files, including the Makefile but excluding the update script, before beginning to ask questions. The answers to many popular questions are contained in the code.

You will probably find yourself wishing for some information which is not portably available within the C language framework, so you will need to write a scrap or two of x86 assembly language. We strongly suggest you do this by writing a C-callable function in a .S file (note that the 'S' is upper-case) rather than using the asm() in-line assembly language facility. Either one can be made to work, but in practice it is very easy to write code with asm() which works with one version of your program or a particular version of your compiler but which breaks mysteriously later. In addition, littering your C code with asm() calls makes it extremely painful to port the code from one hardware platform to another.

The support code includes a sample .S file (add_one.S), and you can find asm() covered in the "Assembler Instructions with C Expression Operands" section of the gcc documentation. If, despite our advice, you decide to use asm(), keep in mind that for correctness you must use the "complicated" version which correctly communicates your intent to the compiler. All uses of asm() will be regarded with suspicion by the course staff. You have been warned!

In terms of getting make to build .S files, note that .S files are isomorphic to .c files in the sense that make contains default rules for building each variety of source file to .o.

Important Dates

  • Wednesday, August 28th: Project 0 assigned.
  • Wednesday, September 4th: Project 0 is due at 11:59pm.

Testing

It is strongly suggested that you carefully read each test program we provide to you and understand what it does and what your traceback() implementation should print when it is run. In order to be considered for a passing grade on this project, a submission must at least successfully complete all tests we give you! Furthermore, if your submission doesn't pass make verify (see below) the grade will be capped at 25%.

It is important that your traceback() function be able to deal with any sort of program in which someone might wish to use it. You must ensure that it will work properly regardless of where it is called within any program, and ensure that traceback() does not damage the correct operation of the program after it returns. Note that traceback() is obviously intended as a debugging aid - therefore assuming that the project that is calling it has a perfectly formed stack is not a good plan. While traceback() may not always be able to print out a well-formed stack with 100% valid arguments, it should never crash nor loop forever.

Also, you should recall from previous classes, certain traditional C functions, such as sprintf(), are generally unsafe; related functions, such as fprintf() and printf(), can be unsafe if invoked in certain ways or contexts. Please take a moment to reacquaint yourself with the details of these issues and their implications, and consider what you could use instead.

Take some time to develop the harshest cases that you can because while grading we will submit your code to the most diabolical tests we can imagine. Of course, if your code is well written, it should have no problems passing these tests.

We will provide a simple output verification script which will ensure that your output format matches our script's expectations; see the 'verify' target in your Makefile.

Documenting

Commenting is an important part of writing code. If you wish, you may get a jump on future assignments by using doxygen; see our doxygen documentation to see how to include comments in your code that can be read by doxygen.

When we grade your projects, we will begin with your documentation. Lack of documentation will be reflected in your grade.

The provided traceback_internal.h file contains example doxygen comments with the sort of information we are expecting to see. Although we put the doxygen comments for our functions in the .h file, you should typically put yours in the .c file, with each function's comment block adjacent to the code. In addition, we have provided a rule in the Makefile to take care of generating the documents for you. This rule is make html_doc and if you have set this up to work we can run it as part of grading.

If you are used to using ASSERT(), REQUIRES(), and ENSURES() according to the practices of 15-122, you may "#include <contracts.h>" to continue doing so in 410.

Other Important Notes

  • Since we will be running and testing your code on Andrew Linux machines, your code will be compiled, linked, and run under gcc 4.4.5 (available as 410-gcc on Andrew Linux). If you are working on standard cluster machines or on LINUX.ANDREW.CMU.EDU, then you don't have to worry about anything. If you are working on a non-cluster personal machine, you can check the version of gcc you are using by running gcc --version on the command line. If your version is not the same as the one which will be used for grading, you must make sure that your code compiles, links, and runs fine in the grading environment.

  • Please do not change any of the provided files except for traceback.c and config.mk. Modifying config.mk should allow you to make any changes necessary for compiling the traceback library and any test programs. We will run your code using our versions of the other files, so any changes you make to them will be overwritten.

  • As compiling many different tests can take a noticable amount of time, we just wanted to mention that the Makefile allows you to build a subset of your tests. Typing make tests/foobar will compile the foobar test (after updating the traceback library if necessary).

  • You will need to ensure that the directory containing 15-410 executable programs, /afs/cs.cmu.edu/academic/class/15410-f13/bin, is on your $PATH. In addition, it will be convenient for you to make an easy-to-type symbolic link to the root of the course AFS volume, e.g.,
    % ln -s /afs/cs.cmu.edu/academic/class/15410-f13 $HOME/410
    Note that in order to access some 15-410 files located in the CS AFS cell you might need to acquire cross-realm tickets as specified on the 15-410 AFS page. Do not be surprised if this doesn't work for a day or two, see below.

  • Your AFS volumes have not been configured yet. Luckily Project 0 is small and simple enough that you can work on it in your personal AFS space for the present. Please do not ask course staff about AFS volumes until after a bulletin board post announcing their creation has been made. If you have trouble with your 410 AFS volume at any time, please read the 410 AFS-volume documentation before sending us mail.

  • For purposes of this assignment, you can assume that the largest function (in terms of number of bytes worth of instructions) is 1 megabyte. We have provided a #define in traceback_internal.h that encodes this constant.

  • While you may find it necessary to write assembly code to complete this assignment, your code does not need to understand x86 opcodes. It is very very hard to write code which correctly disassembles and understands function body code. Meanwhile, it is entirely possible, and much preferred, for your code not to do this. So if you believe your traceback() code needs to decode x86 opcodes, please step back and re-think your design.

  • You may wish to consider what would happen if you ran your traceback() in a multi-threaded program. It is very hard, if not impossible, to solve all the issues this raises, so don't worry too much about it. It may be easier to consider the restricted case where traceback() will only ever be called by one thread at a time (that is, where traceback() will be guarded by a mutual exclusion facility of the type you will write later in the semester.)

Hand-in Instructions

You will be required to hand in all your .c, .S, .h, and any other files necessary to run your code. Minimally this will include the traceback function and any support functions that it requires. When we run your code, it should display the behavior described in the Traceback Details section above.

See http://www.cs.cmu.edu/~410-f13/p0/handinP0.html for details. It is very much to your advantage to do a test hand-in hours or days before the final deadline. The hand-in page includes late-day directions.

evil_test Hints

You may be wondering how your program can determine whether a given address is valid (i.e., backed by memory) at run-time. Like many other questions which will arise as this course unfolds, there are multiple approaches, with different tradeoffs. In general you should strive to identify two to three approaches, choose among them based on weighing a variety of criteria, and briefly document the thinking behind your choice.

But since Project 0 is a warm-up, it seems appropriate to give a few hints.

  • A segmentation fault need not necessarily kill your program. Recall from 15-213 what causes a segmentation fault, how a typical Unix kernel reacts, and what control you have over that sequence of events. If you follow this path, we suggest you not try to code "from memory", but instead carefully study the relevant C library documentation accessible via the man command. Also, you will want to be very careful about which functions are interrupted and/versus which functions are invoked after a function has been interrupted. Some information about interrupted functions is available in the Open Group documentation of sigaction().
  • If you carefully study the documentation for various VM-related system calls, e.g., mmap(), mprotect(), and msync(), you may find a way to (ab)use one or more of them to your benefit. These system calls have some undocumented behaviors, so you should carefully test your usage to make things work the way you expect them to.
  • You may also find it possible to use write() and related system calls as well (again, not everything works the way you might expect, so do some testing).
  • The documentation for the proc pseudo-file-system may be of use to you.

Whichever way you choose, we recommend that you test the behavior of your solution thoroughly - think about strange cases and try them by hand if necessary. If your solution has any limitations, document them.

For this assignment it is more important that whichever way you address this issue is done well (completely and cleanly) than that you choose the alternative which is our favorite.


[Last modified Wednesday August 28, 2013]