22 Jan 1996

Outline

Administrivia

I'd like to learn something about all of you. I got a roster of who was enrolled in this class at the beginning of the semester, but I trust you more than I trust the registrar. I thought about handing out 3x5 cards, but to save trees and to make sure everybody knows how to email me, I'm going to ask you to email me information instead. You won't get a grade on this, but this way I can insure that the we the course staff know about your enrollment. Sometime today I'd like you to email me the following: your name, email address, major, year, and any additional comments you might have.

A common input mistake

Let's look again at the I/O example I used Wednesday:

cin >> i >> j;
cin.get(ch);
There's an obscure rule of C and C++ input that I violated here. Can you see it?

Well, the problem is that I mixed formatted input instructions (ie, >> or scanf) with unformatted input instructions (such as get, fgetc, or gets). I want to emphasize that you shouldn't do this; in fact, I'm going to take Avrim's lead and even write a rule on the board:

Rule: Don't mix formatted input instructions with unformatted input instructions.

Why? Well, the relationship between formatted input instructions and unformatted input instructions tends to be unpredictable. In particular, formatted input tends to buffer input that a subsequent unformatted input instruction might ignore. As a result, your input instructions may easily receive unexpected input.

For example, I tried this example on my machine on the following input:

32
48
c
It's fairly clear that the intended behavior here is for the value of ch to become 'c'. Unfortunately, ch actually becomes the newline character. The way to fix this, of course, is to change all the input instructions to be either formatted or unformatted.

Working around the problem isn't difficult, but if you don't know the rule you'll end up violating it some day and will have a horrible time trying to identify the source of the error.

Conditional compilation

For further information, see section 4.11.3 of Kernighan and Ritchie, The C Programming Language.

As you debug, you'll insert code to print out the program's current status. Of course, you don't want the grader (or, at work, your manager) to see these diagnostic messages. The way around this is to use conditional compilation.

Conditional compilation uses the preprocessor to include pieces of code based on the value of preprocessor symbols. For example, consider the following piece of code:

#define DEBUG 1
 // intermediate code here
#if DEBUG
  cerr << "diagnostic" << endl;
#endif
By doing this, the diagnostic message will be printed only if DEBUG is nonzero. In this way you can easily turn off diagnostic messages by replacing the definition of DEBUG with:
#define DEBUG 0

Conditional compilation has other uses, too. In particular, it is frequently used to deal with differences in platforms to make code more portable:

#if UNIX
  system("rm file");
#elif MS_DOS
  system("del file");
#endif

Comments

See also section 2.1 of Pohl.

Comments in C are delimited by ``/*'' and ``*/'':

/* comment */
These are useful, but C++ adds a new type of comment, delimited by `//' and the newline:
// comment
This new type of comment scores over the older in that it is easier to identify its limits. This renders the possibility of accidentally commenting out huge blocks of code unlikely.

For example, in C you might accidentally do the following:

  /* Step through string
  for(i = 0; s[i]; i++)
    /* capitalize letter */
    s[i] = toupper(s[i]);
In this example, the actual loop has been commented out. Though this will compile cleanly, it will end up capitalizing only one letter of the string - or it may cause your program to crash, if i is beyond the string's bounds.

For this reason I recommend that you use the new commenting style and reserve the C comments for commenting out blocks of code during the debugging process.

Finishing the lecture

I'd like to finish off the recursive parenthesis matching program started in the lecture and then glossed over at the end. Let's write down the code:

int
main(void) {
  // see if START matches end-of-file and everything in-between matches too
  if (find_match(START)) cout << "match" << endl;
  else cout << "error" << endl;
  return 0; // by convention, main returns 0 if all went well
}

// determine if there is a match 

int
find_match(char left) {
  char c;

  // let's assume we have already written read_to_next, is_left, and is_match
  //
  // read_to_next skips over non-parenthesis, returns next left or right paren
  // is_left returns 1 (true) if c is a left paren, e.g., '(' '[' '{' '<'
  // is_match returns 1 (true) if its arguments are a pair of matching
  //    parenthesis, e.g., '[' and ']'

  // the for loop is here so that we can find multiple matches
  // at the same level of nesting, e.g., ([]{()}[])

  for(c = read_to_next(); is_left(c); c = read_to_next())
    if(find_match(c) == 0) return 0;

  /* c is a right marker */
  return is_match(left, c);
}

The for loop is a bit confusing, and I guess it's worth pointing out the following equivalence: for most purposes,

    for(expr1; expr2; expr3) statement
is equivalent to:
    expr1;
    while(expr2) {
        statement
        expr3;
    }

Try working out how this code would work on a couple of examples:

  (an easy example)
  {{[a (somewhat [more] difficult) example]}}
  {a {failed] example}

Pointers and Strings

Here's a nice example justifying the usefulness of pointers: write a routine to swap two integers. You might try the following straightforward code:

    void
    swap(int i, int j) {
        int temp = i;
        i = j;
        j = temp;
    }
This doesn't work, though. Since C uses call-by-value, changes to a parameter within a function don't affect the argument passed to it. For this reason, while the swap function swaps the values of i and j, these changes don't affect the arguments.

We can fix this by creating a function with the prototype

    void swap(int *p, int *q);
How would you do this?

One valid solution would be:

    void
    swap(int *p, int *q) {
        int temp;

        temp = *p;
        *p = *q;
        *q = temp;
    }
Then we could call it by writing:
    swap(&i, &j);

A string is a NUL-terminated array of characters. I'm saying NUL, not NULL: NUL is a character that is written also as '\0'.

There's a standard routine called strcpy with the prototype is:

    void strcpy(char *to, char *from);
We can use pointers to represent a string, because in C a string is a particular type of array, and arrays and pointers have a close correspondence as we noted last time.

This function copies the string from into the string to. Let's look at an implementation of this:

    void
    strcpy(char *to, char *from) {
      while ((*to = *from) != '\0') { 
        ++from;
        ++to;
      }
    }
Does this copy the final NUL character? Why? Try looking at an example: "cab".

It does. The reason is that the copy occurs before the comparison takes place. This is an esoteric example, but some people persist in writing code like this because there's this belief that it's efficient. An even more crytic way of doing the function is the following:

    while(*to++ = *from++);
This is ugly, but it works, depending on the facts that the incrementation operator has a higher precedence than the dereferencing operator, that post-incrementation is being used here, and that the result of an assignment is the value assigned (so that a NUL character, which is ASCII zero, is equivalent to the zero integer, which terminates the while loop). Depending on all these obscure C rules risks the life of any future people who have to debug your code, but you should be aware of them in order to survive cases when your job depends on debugging some maniac keystroke-conserving programmer has recklessly written code like this.

A much more readable way to write the loop (in my opinion) is involves using a for loop:

    for(; *from; to++, from++) *to = *from;
    *to = *from;