Date: 18 Jul 91 21:41:23 EDT From: Olin.Shivers@BRONTO.SOAR.CS.CMU.EDU Subject: Typesetting code examples in LaTeX While I was writing my dissertation, I learned a lot of things about typesetting code in LaTeX. I also wrote some packages to help out. Here's what I learned; maybe it will help you. For the rest of this message, ~shivers means /afs/cs/user/shivers. 1. It's a funny thing, but if you enter the following {\tt foo?`bar' bar!`foo'} you will get a surprise. The ` chars will disappear in the output, and the ? and ! will invert. Apparently ?` and !` are ligatures. This is *actually documented*, off in the corner of page 40 in the LaTeX manual. To turn this off, you must change the catcode of ` and then use the \@noligs command: \catcode``=13\@noligs But it's just a little more complicated than that. You aren't normally allowed to use a command with @ in its name -- these commands are supposed to be low-level commands not exposed to the user. To enforce this, LaTeX reads your source in a mode that defines @ to be "other," instead of "letter," so it cannot appear in command names. This mode is toggled with the \makeatletter and \makeatother commands. So, you must do something like: \makeatletter \newcommand{\noligs}{\catcode``=13\@noligs} \makeatother Then you can use your \noligs wherever you like: {\tt\noligs foo?`bar' bar!`foo'} Be warned that it will work only in certain circumstances: where LaTeX executes your \noligs command before it scans the following text. The \noligs command changes the way LaTeX reads in text, so LaTeX has to process it before looking at the text to which it is supposed to apply. However, if you use all of this in an argument to a command, say \mbox{\tt\noligs foo?`bar' bar!`foo'} then LaTeX scans it all in as the argument to \mbox before executing then \noligs, and you lose. I don't know how to fix this problem. If you could just turn off the `? and`! ligatures at top-level, you'd be fine (for English text), but I don't know how to do this (you can't say \catcode``=13\@noligs at top-level because then ``quoted passages like this'' will break.) 2. It's a funny thing, but if you enter the following {\tt Foo? Bar! What. Is. Going. On. (set! x (+ x 1))} you will not get what you wanted. LaTeX adds extra space after end-of-sentence punctuation, so your fixed-width code is going to get skewed right a little bit after each such punctuation. Scheme code that uses ! to end destructive operators (e.g., set-car!) and ? to end predicates (e.g., integer?) will lose. This is quite noticeable and quite irritating. You turn this extra space off with \frenchspacing. These two points leads us to the following useful macro, which is what you usually really want instead of \tt: \makeatletter \newcommand{\ttt}{\tt\catcode``=13\@noligs\frenchspacing} \makeatother Now, you can say {\ttt ...whatever you like...} and win. Note that \ttt is much more flexible than \verb: you can still do LaTeX commands, and have the text appear as an argument to a command (\verb's lose unless they're at top-level). 3. There are style options (like [times]) you can use with LaTeX that give you postscript fonts. This is mostly great. However, when you use the times option, Courier font is used for \tt. I (and others) find this font kind of spidery. It is also not very dense horizontally -- you can't get too many characters on a line. If your code listings drift off to the right with major indentation, or you just have long lines of code, you will lose. It turns out that the Computer Modern tt font, cmtt, is quite nice, and significantly (7/6) denser. I have a style file ~shivers/lib/tex/ct.sty that makes \tt be cmtt font. You just make sure that you list this style option *after* the times option, \documentstyle[12pt,twosided,times,ct]{cmu-art} put ~shivers/lib/tex on your TEXINPUTS env variable (or copy ct.sty to your macro directory), and you're set. 4. I have a style file ~shivers/lib/tex/code.sty that gives you kind of "weak verbatim" mode. That is, code is an environment like verbatim, except that \, {, and } are still special. You must escape these three chars to get them in your code listing: \\, \{, \}. (Having to escape { and } is admittedly inconvenient for C programmers. What can you do? It would be possible to have a version where some other pair of chars served as LaTeX group delimiters inside the env instead of {} -- say, []. But [] are also used frequently in C programs. There really isn't a good pair to reserve. Interested C programmers are invited to consider the problem.) The advantage of the code environment is that you can escape to LaTeX -- italics, math, whatever. This is particularly nice when - You are typesetting some Pidgin Algol code, and want to use math operators like \union or \join in the algorithm. - You are typesetting what is really a code template, and want to indicate the elided parts with italics, e.g.: \begin{code} for(i=n; i--;) x += {\em expression being summed\/}; \end{code} Before you use code.sty, you should read through its voluminous comments for more information on how to use it. Besides a code environment, there is also a \cd{...} command, for inline text and a codebox environment, that makes a box only as wide as the longest line. Codebox is useful for centering code. 5. I prefer to use the $\lambda$ character instead of the keyword LAMBDA in my Scheme source listings. It is easier to read and saves columns. A macro to facilitate this is: \renewcommand{\l}[1]{\ \llap{$\lambda$\hskip-.05em}\ (#1)} You can typeset (lambda (x y) (+ x (sin y))) as (\l{x y} (+ x (sin y))) \l is bound to some random symbol in standard LaTeX, so you should be able to blow it away with no problem. The \l macro has four advantages over just saying $\lambda$: - It is shorter and easier to type. - It occupies exactly as many columns in your source as it does in the output. This makes it easy to line up the columns in your source. - The \llap command forces the $\lambda$ to be exactly as wide as every other char in the fixed-width \tt font. This is important -- otherwise, lines of code with $\lambda$'s in them will be skewed noticeably. - The \hskip-.05em shifts the \lambda right just a bit. In the cmtt font, it's uncomfortably close to the ( on the left. This is a minor point; the \hskip can be freely deleted. Of course, converting a long code listing back and forth between lambda and \l would be tedious if you had to do it manually. Here are the regexp pairs to give gnumacs' [query-]replace-regexp to do the conversions in both directions: (lambda (\([^)]*\)) (\\l{\1} (\\l{\([^}]*\)} (lambda (\1) Here are two gnumacs functions for doing the same things: (defun detexify-lambdas () (interactive) (query-replace-regexp "(\\\\l{\\([^}]*\\)}" "(lambda (\\1)" nil)) (defun texify-lambdas () (interactive) (query-replace-regexp "(lambda (\\([^)]*\\))" "(\\\\l{\\1}" nil)) Just add water; makes its own sauce. 6. Mysteriously vanished paragraph indentation You may define some macro only to find that when you begin paragraphs with it, LaTeX doesn't seem to figure out you've begun a paragraph -- the indentation will vanish, for instance. This is because your macro produced an hbox. LaTeX, when it's in vertical mode, just gloms hboxes onto the vertical list, without ever going into the mode that it reads paragraphs in. For example, try this: This is the last line of paragraph 1. \hbox{This} is the first line of paragraph 2. The solution is to tell LaTeX that it should leave vertical mode when it encounters your macro. You do this by putting a \leavevmode command in front of the produced box. That way, whether your hbox is produced in the middle of a paragraph or at the beginning, the results are the same: This is the last line of paragraph 1. \leavevmode\hbox{This} is the first line of paragraph 2. You won't run into this problem if you use \mbox's: it's defined to be \leavevmode\hbox{#1}. It only pops up in stranger circumstances. This has, actually, nothing to do with typesetting code examples. 7. LaTeX won't break words in \tt font. If you have a word like {\tt call-with-current-continuation} in your text, you've got a problem. You can't use \- to allow breaks at the hyphens because \- only inserts a - when there is a break. Here are three fixes: a. \linebreak[0] doesn't encourage a line break where it occurs (as do \linebreak and \linebreak[1] through \linebreak[4]), but it does allow line breaks, even if it occurs in the middle of a word. So you simply but a \linebreak[0] after every hyphen in your variable. Better still, define some macro with a shorter name: \renewcommand{\=}{\linebreak[0]} % optional break w/no added hyphen. % \= is bound to a not-very-useful-for-English accent command. % Another possibility is \ob. ... {\tt call-\=with-\=current-\=continuation} ... b. You can drop down to the TeX level and use the basic \discretionary form. \discretionary{}{}{} says: if you break the word here, put at the end of the top line and at the beginning of the bottom line. If you don't break the word here, the \discretionary form turns into the text. So you can define your \= macro to be \newcommand{\=}{\discretionary{}{}{}} Or, alternately, \newcommand{\=}{\discretionary{-}{}{-}} if you want to say just \= instead of -\= (more convenient; less flexible). c. For every font, there's a magic char called the \hyphenchar that TeX knows it can break at, with no added hyphens (so it can break words like "x-ray" without adding a second hyphen). You simply tell TeX that - is the \hyphenchar for the \tt fonts, which can be done with: \hyphenchar\nintt=`\- \hyphenchar\tentt=`\- \hyphenchar\elvtt=`\- \hyphenchar\twltt=`\- Having done this, you can blithely use long, hyphenated variable names, and TeX will obligingly break them for you at the hyphens. However, this will also allow TeX to break long, hyphenated variable names inside the code environment, when you don't want it, so this technique is not always a win. You can see lots of this stuff in the file ~shivers/lib/tex/hax.tex. That's what I know about typesetting code in LaTeX. If you are a TeX wizard, and spot an ommission, and error, or non-optimisation in any of this, I'd sure appreciate hearing it. -Olin