Carnegie Mellon
SCS logo
Computer Science Department
home
syllabus
staff
schedule
lecture
projects
homeworks
QA
 
 

15-410 Project 0: Traceback


Table of Contents

Project Overview

In this project you will be writing a "library" which contains a single function called trackback(). traceback() prints out a stack trace of the program it is called from. The stack trace will include all of the function calls made to reach the current location in the program. You will be provided with information about all of the functions available in the program and their arguments.

One example of a possible use for such a function would be to call it from a segmentation fault handler to help debug the program.

Traceback Details

The prototype for trackback, as defined in traceback.h, is

void traceback(FILE *);

The argument to traceback is the file stream to which the stack trace should be printed. For most programs, this will probably be stderr, but taking it as an argument allows for greater flexibility in the use of traceback.

Also defined in traceback.h is a table of all the functions in the program. Each entry in the function table has the type functsym_t, which contains the name of the function and the address at which the function begins along with a list of arguments. Each argument is defined as an argsym_t containing the argument type and name of the argument. The type is stored as an integer and can be matched with the definitions in traceback.h. For the sake of simplicity, we are requiring you to recognize only char, int, float, double, char*, and char**.

If the function list contains fewer than MAX_NUM_FUNCTIONS it will be terminated by a function with a zero-length name. Similarly, if the argument list for a function contains fewer than MAX_NUM_ARGS arguments it will be terminated by an argument with zero length name. The functions in the list are sorted by address.

For each function you should print the name of the function and all of the arguments. When printing each argument you should output the name and the actual argument whenever the type is known. This means you must print the string in the case of a char* and all of the strings in the case of a char**. Be warned that traceback() must not cause a program calling it to terminate due to a segmentation fault. If the type of an argument is not known, you need not print the value.

For those of you wondering how you can have a global table containing a program's function names and argument types, this is not normally possible within the the C language framework.

Each test program linked against the traceback library will obtain the code for your traceback() function and a blank function table. After the program is built, a perl script will decode the object file and modify it so that the table slots are filled in with the correct information (see the lecture notes for a diagram). This is not really the correct way to obtain this information; one should obtain it at runtime by having a long and complicated conversation with a large confusing library which understands how to parse executable files. The correct approach, however, is significantly more work than intended for this project and does not really add to the learning experience as it is just an exercise in jumping through hoops.

Formatting

traceback() should output the functions in order from the last (most recent) function called to the first function called. It should contain the names and values of all of the arguments (and void if there are no arguments). The output of traceback() should match the following sample partial output:

Function foo(int i=5, float f=35.000000), in
Function foobar(char c='k', char *str="test", char *unprintable=0xffff0000), in
Function bar(void), in

This indicates that some function (not shown) called bar() with no arguments. bar() then called foobar with a character 'k', a string "test", and a string called unprintable, located at 0xffff0000 in memory, which traceback() was unable to print. foobar() in turn called foo() with the arguments 5 and 35, and foo() invoked traceback().

All arguments are printed as "type name=value", but the following special rules should also be applied:

  • chars should be printed between single quotes
  • integers and floating-point numbers should be printed in base 10
  • strings should be printed between double quotes
  • string arrays are displayed in the format {"string1","string2","string3"}. The quotation marks are to be added around each string by the output function; they are not part of the string. If a string in the array is not printable, the address of that string should be printed in its place.
  • If there are more than 3 strings in an array, only the first 3 should be printed followed by a "..." (eg: {"string1", "string2", "string3", ...}).
  • If a string has more than 25 characters, only the first 25 should be printed followed by a "..." (eg: "this string has more than 25 characters" should be printed as "this string has more than"...)
  • anything that cannot have its value printed for any reason should have its address printed in hex. If part of a string is printable and part is not, then the entire string is considered to be unprintable.

Goals

Despite the fact that this is the smallest project of the five that will be assigned in this class, it is important to pay attention to the key concepts in Project 0. The ideas taught here will provide the foundation for the next four projects. In particular, we would like you to be comfortable with:

  1. Writing clean code in C. Many people like the C programming language because it gives the programmer a lot of freedom (pointers, casting, etc). It is very easy to hang yourself with this rope. Developing (and sticking with) a consistent system of variable definitions, commenting, and separation of functionality is essential.

    People have asked about using C++ in this class. This is probably much harder than you think, since you would need to begin by implementing your own thread-safe (or, at least, interrupt-aware) versions of new and delete. In addition, you would probably find yourself implementing other pieces of C++ runtime code; this could turn into quite a hobby. As a result, you should do this program in C as a way of re-familiarizing yourself with the language you'll be using for the remainder of the course.

  2. Writing psuedocode. For systems programming, it is very important to think out crucial data structures and algorithms ahead of time since they become important primitives for the rest of the system.

  3. Commenting. Though you will not be working with a partner for the first two projects, you will be on all subsequent projects. It is important to include comments so someone else looking at or maintaining your code can quickly understand what your code is doing without having to look at its internals. For this assignment, which is a refresher, it should not be hard to comment it appropriately and you may do so in the standard fashion. However, since the remainder of the assignments will use it, we will describe the doxygen system, similar to javadoc for C.

  4. Using common development tools (gcc, ld).
  5. Communicating with the TAs using various channels of communication (zephyr, bulletin board, staff-410 at the CS domain, Q&A archive, course web page, office hours).

Getting Started

You will probably find yourself wishing for some information which is not portably available within the C language framework, so you will need to write a scrap or two of x86 assembly language.

We suggest you do this by writing a C-callable function in a .S file (note that the 'S' is upper-case) rather than using the asm() in-line assembly language facility. Either one will work, but in practice it is very easy to write code with asm() which works with one version of your program or a particular version of your compiler but which breaks mysteriously later. In addition, littering your C code with asm() calls makes it extremely painful to port the code from one hardware platform to another.

The support code includes a sample .S file, and you can find asm() covered in the "Assembler Instructions with C Expression Operands" section of the gcc documentation. If, despite our advice, you decide to use asm(), keep in mind that for correctness you must use the "complicated" version which correctly communicates your intent to the compiler.

Important Dates

  • Wednesday, September 1st: Project 0 assigned.
  • Wednesday, September 8th: Project 0 is due at 11:59pm.

Testing

It is important that your trackback() function be able to deal with any sort of program in which someone might wish to use it. You must ensure that it will work properly regardless of where it is called within any program, and that traceback() does not damage the correct operation of the program after it returns. Take some time to develop the evilest harshest cases that you can because while grading we will submit your code to the most diabolical tests we can imagine. Of course, if your code is well written, it should have no problems passing these tests.

Also, for your convenience, we will provide an output verification script which will ensure that your output format matches our script's expectations. We will make a post to the bboard when it is available.

Documenting

Commenting is an important part of writing code. If you wish, you may get a jump on future assignments by using doxygen; see our doxygen documentation to see how to include comments in your code that can be read by doxygen.

When we grade your projects, we will begin with your documentation. Lack of documentation will be reflected in your grade.

The provided traceback_internal.h file contains example doxygen comments with the sort of information we are expecting to see. Although we put the doxygen comments for our functions in the .h file, you should typically put yours in the .c file, with each function's comment block adjacent to the code. In addition, we have provided a rule in the Makefile to take care of generating the documents for you. This rule is make html_doc and if you have set this up to work we will run it as part of grading.

Other Important Notes

  • Since we will be running and testing your code on Andrew Linux machines, your code will be compiled, linked, and run under gcc 3.2.1. If you are working on standard cluster machines, then you don't have to worry about anything. If you are working on a non-cluster personal machine, you can check the version of gcc you are using by running gcc --version on the command line. If your version is not 3.2.1, you must make sure that your code compiles, links, and runs fine under 3.2.1.

  • Please do not change any of the provided files except for traceback.c and traceback.mk. Modifying traceback.mk should allow you to make any changes necessary for compiling the traceback library and any test programs. We will run your code using our versions of the files, so any changes you make to other files will be overwritten.

  • As compiling many different tests can take a noticable amount of time, we just wanted to mention that the Makefile allows you to build a subset of your tests. Typing make foobar will compile the foobar test (after updating the traceback library if necessary).

  • While you probably do not need to use any 410-built programs for this assignment, you will probably want to set things up so that /afs/cs.cmu.edu/academic/class/15410-f04/bin is on your $PATH. For your convenience, you may wish to make an easy-to-type symbolic link to the root of the course AFS volume, e.g.,

    % ln -s /afs/cs.cmu.edu/academic/class/15410-f04 $HOME/410

  • Your AFS volumes have not been created yet, and the class bulletin boards still contain posts from the previous semester. We know about these issues and the relevant parties are working on them...luckily, they should not impede your work as you begin this project.

Hand-in Instructions

You will be required to hand in all your .c, .S, .h, and any other files necessary to run your code. Minimally this will include the traceback function and any support functions that it requires. When we run your code, it should display the behavior described in the Traceback Details section above.

See http://www.cs.cmu.edu/~410-f04/p0/handinP0.html for details.

evil_test Hints

You may be wondering how your program can determine whether a given address is valid (i.e., backed by memory) at run-time. Like many other questions which will arise as this course unfolds, there are multiple approaches, with different tradeoffs. In general you should strive to identify two to three approaches, choose among them based on weighing a variety of criteria, and briefly document the thinking behind your choice.

But since Project 0 is a warm-up, it seems appropriate to give a few hints.

  • A segmentation fault need not necessarily kill your program. Recall from 15-213 what causes a segmentation fault, how a typical Unix kernel reacts, and what control you have over that sequence of events,
  • If you carefully study the documentation for various system calls, such as msync() and write(), you may find a way to (ab)use one of them to your benefit,
  • The documentation for the proc pseudo-file-system may be of use to you.

For this assignment it is more important that whichever way you address this issue is done well (completely and cleanly) than that you choose the alternative which is our favorite.


[Last modified Wednesday September 01, 2004]