Carnegie Mellon
SCS logo
Computer Science Department
home
syllabus
staff
schedule
lecture
projects
homeworks
 
 

15-410 Project 0: Traceback


Table of Contents

Project Overview

In this project you will be writing a "library" which contains a single function called traceback(). traceback() prints out a stack trace of the program it is called from. The stack trace will include all of the function calls made to reach the current location in the program. You will be provided with information about all of the functions available in the program and their arguments.

One example of a possible use for such a function would be to call it from a segmentation fault handler to help debug the program.

Traceback Details

The prototype for trackback, as defined in traceback.h, is

void traceback(FILE *);

The argument to traceback is the file stream to which the stack trace should be printed. For most programs, this will probably be stderr, but taking it as an argument allows for greater flexibility in the use of traceback.

Also defined in traceback.h is a table of all the functions in the program. Each entry in the function table has the type functsym_t, which contains the name of the function and the address at which the function begins along with a list of arguments. Each argument is defined as an argsym_t containing the argument type and name of the argument. The type is stored as an integer and can be matched with the definitions in traceback.h. For the sake of simplicity, we are requiring you to recognize only char, int, float, double, char*, and char**. All subsequent references to 'string' in this document refer to C-style character strings.

If the function list contains fewer than MAX_NUM_FUNCTIONS it will be terminated by a function with a zero-length name. Similarly, if the argument list for a function contains fewer than MAX_NUM_ARGS arguments it will be terminated by an argument with zero length name. The functions in the list are sorted by address.

For each function you should print the name of the function and all of the arguments. When printing each argument you should output the name and the actual argument whenever the type is known. This means you must print the string in the case of a char* and some (see below) of the strings in the case of a char**. Be warned that traceback() must not cause a program calling it to terminate due to a segmentation fault. If the type of an argument is not known, you need not print the value.

Printing of char** arrays is heuristic only. There is no valid within-language way of telling how many char * values are in the array (that's why you need int argc in main(), for example). The array of char * values will not necessarily be null-terminated with a (char *)0 value. Therefore, if you're handed a char** array with only one or two elements; you may wind up trying to print some spurious strings. This is OK; however, do not allow this to crash your program.

For those of you wondering how you can have a global table containing a program's function names and argument types, this is not normally possible within the the C language framework.

Each test program linked against the traceback library will obtain the code for your traceback() function and a blank function table. After the program is built, a perl script will decode the object file and modify it so that the table slots are filled in with the correct information (see the lecture notes for a diagram). This is not really the correct way to obtain this information; one should obtain it at runtime by having a long and complicated conversation with a large confusing library which understands how to parse executable files. The correct approach, however, is significantly more work than intended for this project and does not really add to the learning experience as it is just an exercise in jumping through hoops.

Formatting

traceback() should output the functions in order from the last (most recent) function called to the first function called. It should contain the names and values of all of the arguments (and void if there are no arguments). The output of traceback() should match the following sample partial output:

Function foo(int i=5, float f=35.000000), in
Function foobar(char c='k', char *str="test", char *unprintable=0xffff0000), in
Function bar(void), in

This indicates that some function (not shown) called bar() with no arguments. bar() then called foobar with a character 'k', a string "test", and a string called unprintable, located at 0xffff0000 in memory, which traceback() was unable to print. foobar() in turn called foo() with the arguments 5 and 35, and foo() invoked traceback().

If you determine that a function does not conform to the calling conventions at all (for example, the value for its stack pointer could not possibly be a stack frame), traceback should terminate. If you wish, you may emit a single line, beginning with FATAL: to describe the situation you have run into. Note that this does not cover the case of wild pointers or other 'illegal' values in the programs arguments: perfectly legal programs can pass wild pointers around without violating calling conventions.

If a function (say at address 0x20002ab0) has a well-formed stack frame but no entry in the functions table, you should print a line of the form:

Function 0x20002ab0(...), in

In this case, you should keep tracing the stack frames after this function if possible.

All arguments are printed as "type name=value", but the following special rules should also be applied:

  • For this assignment, 'printable' characters are those for which the standard library isprint() function returns true (see ctype.h). A string that contains an unprintable character is considered an unprintable string. Finally, a comment that contains an unprintable word is considered immature but at times understandable, but you don't need to know about that for this project.
  • Chars should be printed between single quotes if printable. If not, the chars should be printed (still within single quotes) as escaped octal characters: for example, if an argument c contains the ASCII 'ACK' character, the argument should be printed as follows: char c='\6' . This applies to unprintable characters only - see below for what to do with unprintable strings.
  • Integers and floating-point numbers should be printed in base 10. The default behavior of printf() is acceptable for floats and doubles, both in terms of number of digits printed and in terms of what is printed for unusual floating point values (NaN, plus or minus infinity).
  • Strings should be printed between double quotes.
  • String arrays are displayed in the format {"string1","string2","string3"}. The quotation marks are to be added around each string by traceback(); they are not part of the string. If a string in the array is not printable, the address of that string should be printed in its place. If a string array contains 4 or more strings, only the first 3 should be printed, followed by a "...". For example, {"string1", "string2", "string3", "string4"} should be printed as {"string1", "string2", "string3", ...}. Unprintable strings count towards these 3 too (i.e. you only have to look at the first three strings no matter what). As stated above, this is best-effort behavior: C arrrays do not have size information in them nor are they by default null-terminated (except in the special case of character strings).
  • If a string has more than 25 characters, only the first 25 should be printed followed by a "..." (eg: "this string has more than 25 characters" should be printed as "this string has more than...")
  • Anything that cannot have its value printed for any reason should have its address printed in hex, except if it is a valid char value that happens to contain a single unprintable character. If part of a char * string is printable and any part is not, then the entire string is considered to be unprintable. A string array with one or more unprintable strings within it is still considered printable itself as long as the string array is itself a valid array of strings.
  • Anything of an unknown type should be displayed as if it had some type "UNKNOWN" and as though it were an unprintable constant, that is, with the value in hex.

Goals

Despite the fact that this is the smallest project of the five that will be assigned in this class, it is important to pay attention to the key concepts in Project 0. The ideas taught here will provide the foundation for the next four projects. In particular, we would like you to be comfortable with:

  1. Writing clean code in C. Many people like the C programming language because it gives the programmer a lot of freedom (pointers, casting, etc). It is very easy to hang yourself with this rope. Developing (and sticking with) a consistent system of variable definitions, commenting, and separation of functionality is essential.

    People have asked about using C++ in this class. Writing your kernels in C++ is probably much harder than you think, since you would need to begin by implementing your own thread-safe (or, at least, interrupt-aware) versions of new and delete. In addition, you would probably find yourself implementing other pieces of C++ runtime code; this could turn into quite a hobby. As a result, you should do this program in C as a way of re-familiarizing yourself with the language you'll be using for the remainder of the course.

  2. Writing psuedocode. For systems programming, it is very important to think out crucial data structures and algorithms ahead of time since they become important primitives for the rest of the system.

  3. Commenting. Though you will not be working with a partner for the first two projects, you will be on all subsequent projects. It is important to include comments so someone else looking at or maintaining your code can quickly understand what your code is doing without having to look at its internals. For this assignment, which is a refresher, it should not be hard to comment it appropriately and you may do so in the standard fashion. However, since the remainder of the assignments will use it, we will describe the doxygen system, similar to javadoc for C.

  4. Using common development tools (gcc, ld).
  5. Communicating with the TAs using various channels of communication (zephyr, bulletin board, staff-410 at the CS domain, Q&A archive, course web page, office hours).

Since code quality (layout, modularity, defensive programming) and readability will be so important in this class (and after you leave CMU), they will have a substantial impact on your project grades. In the case of Project 0, expect that they will determine 10-20% of your project grade. The 410 doxygen documentation points to two acceptable coding style guides.

Getting Started

To get started with the project, download the support-code tarball and extract the files contained within. You should probably study all of the files, including the Makefile but excluding the update script, before beginning to ask questions. The answers to many popular questions are contained in the code.

You will probably find yourself wishing for some information which is not portably available within the C language framework, so you will need to write a scrap or two of x86 assembly language.

We strongly suggest you do this by writing a C-callable function in a .S file (note that the 'S' is upper-case) rather than using the asm() in-line assembly language facility. Either one will work, but in practice it is very easy to write code with asm() which works with one version of your program or a particular version of your compiler but which breaks mysteriously later. In addition, littering your C code with asm() calls makes it extremely painful to port the code from one hardware platform to another.

The support code includes a sample .S file (add_one.S), and you can find asm() covered in the "Assembler Instructions with C Expression Operands" section of the gcc documentation. If, despite our advice, you decide to use asm(), keep in mind that for correctness you must use the "complicated" version which correctly communicates your intent to the compiler.

In terms of getting make to build .S files, note that they are isomorphic to .c files in the sense that make contains default rules for building both to .o.

Important Dates

  • Wednesday, January 18th: Project 0 assigned.
  • Wednesday, January 25th: Project 0 is due at 11:59pm.

Testing

It is important that your traceback() function be able to deal with any sort of program in which someone might wish to use it. You must ensure that it will work properly regardless of where it is called within any program, and that traceback() does not damage the correct operation of the program after it returns. Note that traceback() is obviously intended as a debugging aid - therefore assuming that the project that is calling it has a perfectly formed stack is not a good assumption. While traceback() may not always be able to print out a well-formed stack with 100% valid arguments, it should never crash nor loop forever.

Also, you should recall from previous classes, certain traditional C functions, such as sprintf(), are unsafe. Please take a moment to reacquaint yourself with the details of the issue, its implications, and what you could use instead.

Take some time to develop the harshest cases that you can because while grading we will submit your code to the most diabolical tests we can imagine. Of course, if your code is well written, it should have no problems passing these tests.

We will provide a simple output verification script which will ensure that your output format matches our script's expectations; see the 'verify' target in your Makefile.

Documenting

Commenting is an important part of writing code. If you wish, you may get a jump on future assignments by using doxygen; see our doxygen documentation to see how to include comments in your code that can be read by doxygen.

When we grade your projects, we will begin with your documentation. Lack of documentation will be reflected in your grade.

The provided traceback_internal.h file contains example doxygen comments with the sort of information we are expecting to see. Although we put the doxygen comments for our functions in the .h file, you should typically put yours in the .c file, with each function's comment block adjacent to the code. In addition, we have provided a rule in the Makefile to take care of generating the documents for you. This rule is make html_doc and if you have set this up to work we will run it as part of grading.

Other Important Notes

  • Since we will be running and testing your code on Andrew Linux machines, your code will be compiled, linked, and run under gcc 3.2.1. If you are working on standard cluster machines, then you don't have to worry about anything. If you are working on a non-cluster personal machine, you can check the version of gcc you are using by running gcc --version on the command line. If your version is not 3.2.1, you must make sure that your code compiles, links, and runs fine under 3.2.1.

  • Please do not change any of the provided files except for traceback.c and traceback.mk. Modifying traceback.mk should allow you to make any changes necessary for compiling the traceback library and any test programs. We will run your code using our versions of the files, so any changes you make to other files will be overwritten.

  • As compiling many different tests can take a noticable amount of time, we just wanted to mention that the Makefile allows you to build a subset of your tests. Typing make foobar will compile the foobar test (after updating the traceback library if necessary).

  • While you probably do not need to use any 410-built programs for this assignment, you will probably want to set things up so that /afs/cs.cmu.edu/academic/class/15410-s06/bin is on your $PATH. For your convenience, you may wish to make an easy-to-type symbolic link to the root of the course AFS volume, e.g.,

    % ln -s /afs/cs.cmu.edu/academic/class/15410-s06 $HOME/410
    Note that in order to access 15-410 files located in the CS AFS cell you will need to acquire cross-realm tickets as specified on the 15-410 AFS page.

  • Your AFS volumes have not been created yet. We know about this issue and the relevant parties are working on them...luckily, this should not impede your work as you begin this project.

  • For purposes of this assignment, you can assume that the largest function (in terms of number of bytes worth of instructions) is 1 megabyte. We have provided a #define in traceback_internal.h that encodes this constant.

  • While you may find it necessary to write asm code to complete this assignment, your code does not need to understand x86 opcodes. It is possible (and preferred) that you write your traceback() that does not disassemble function bodies, and very very hard to write one that does. Step back and rethink your design if you believe that you would need to process x86 opcodes directly.

  • You may wish to consider what would happen if you ran your traceback() in a multi-threaded program. It is very hard, if not impossible, to solve all the issues this raises, so don't worry too much about it. It may be easier to consider the restricted case where traceback() will ever only be called by one thread at a time (that is, where traceback() will be guarded by a mutual exclusion facility of the type you will write later in the semester.)

Hand-in Instructions

You will be required to hand in all your .c, .S, .h, and any other files necessary to run your code. Minimally this will include the traceback function and any support functions that it requires. When we run your code, it should display the behavior described in the Traceback Details section above.

See http://www.cs.cmu.edu/~410/p0/handinP0.html for details.

evil_test Hints

You may be wondering how your program can determine whether a given address is valid (i.e., backed by memory) at run-time. Like many other questions which will arise as this course unfolds, there are multiple approaches, with different tradeoffs. In general you should strive to identify two to three approaches, choose among them based on weighing a variety of criteria, and briefly document the thinking behind your choice.

But since Project 0 is a warm-up, it seems appropriate to give a few hints.

  • A segmentation fault need not necessarily kill your program. Recall from 15-213 what causes a segmentation fault, how a typical Unix kernel reacts, and what control you have over that sequence of events.
  • If you carefully study the documentation for various system calls, such as msync() and write(), you may find a way to (ab)use one of them to your benefit. Both of these calls have some undocumented behaviors so you should carefully test these calls to be sure that things work the way you expect them to work.
  • The documentation for the proc pseudo-file-system may be of use to you.

Whichever way you choose, we recommend that you test the behavior of your solution thoroughly - think about strange cases and try them by hand if necessary. If your solution has any limitations, document them.

For this assignment it is more important that whichever way you address this issue is done well (completely and cleanly) than that you choose the alternative which is our favorite.


[Last modified Tuesday January 17, 2006]