Standard ML at CMU / MLWorks

Using the MLWorks System

  1. Back to the Introduction

  2. Interacting with MLWorks

  3. Using Files

  4. Editing SML Programs Using Emacs

  5. Making Sense of Error Messages

  6. Further Issues


This is a guide to editing and executing Standard ML (SML) programs at Carnegie Mellon University, using Harlequin Incorporated's MLWorks system. This document was written by Peter Lee (petel@cs.cmu.edu), with extensive contributions by Robert Harper (rwh@cs.cmu.edu), Iliano Cervesato (iliano@cs.cmu.edu), Carsten Shurmann (carsten@cs.cmu.edu), Frank Pfenning (fp@cs.cmu.edu), and Herb Derby (derby@cs.cmu.edu).

This is not a reference manual for the Standard ML language. If you need a reference manual or a tutorial, you can find several sources of information, both on-line and in hard copy from the Introduction.


Interacting with MLWorks

When you start the MLWorks system, an initial window called the console window will appear (assuming you are running X-windows). The displayed message should be

MLWorks 1.0
Copyright (C) 1996 The Harlequin Group Limited.  All rights reserved.
MLWorks is a trademark of The Harlequin Group Limited.

The window also offers a number of pull-down menus which are described in the User's Guide, available on-line at

http://www.cs.cmu.edu/afs/andrew/scs/cs/mlworks/doc/guide/html/index.htm

The ML system is launed from the menu Tools>Listener. A listener window will appear, prompting you for an SML expression: This command will then be compiled and executed, and the result displayed.

When prompted, you can type in a top-level declaration. There are several kinds of top-level declarations in SML. For example, the following is declaration of a function called inc that increments its integer argument. (In these examples, "MLWorks>") is the MLWorks prompt, and the text in teletype font is the user input. In some browsers, user input will also appear in blue text. The italic font is used for the output from the MLWorks system. The symbol represents a carriage return.)

MLWorks> fun inc x = x + 1;
    val inc = fn : int -> int

The text "fun inc x = x + 1" is the declaration for the inc function. The semicolon (";") is a marker that indicates to the MLWorks system that it should perform the following actions: elaborate (that is, perform typechecking and other static analyses), compile (to obtain executable machine code), execute, and finally print the result of this declaration. After all of this, the system then prompts for new input and the whole process starts again. This is the so-called "top-level loop". To exit from the MLWorks listener, select File>Close or simply type an end-of-file character (Control-d) to the prompt.

In the example above, the printed result shows that inc is a function that takes an integer argument and yields an integer result. Actually, it is important for you to know that, in SML, functions are "first-class" values, fundamentally no different than other values such as integers. So, to be more precise, it is better to say that the identifier inc has been bound to a value (which happens to be a function, as denoted by the fn keyword above) of type int -> int.

If we had left out the semicolon, then the elaboration, compilation, execution, and printing would have been deferred and a prompt (this time, an equal sign, "=") would be given, for either a continuation of the declaration of inc or else another top-level declaration. When a semicolon is finally entered (perhaps after several more top-level declarations), all of the declarations since the last semi-colon would be processed in sequence. For example:

MLWorks> fun inc x = x + 1
fun f n = (inc n) * 5;
    val inc = fn : int -> int
    val f = fn : int -> int

In this example, we have defined the inc function as well as a function f that uses inc. Notice that no prompt was given for the second function.

In the interactive top-level loop, the simplest form of input is an expression. For example, after typing in the declarations for inc and f above, we can now call f by typing in:

MLWorks> f (2+4);
    val it = 35 : int

Notice that since no identifier is given to bind to the value, the interactive system has chosen the identifier it and bound it to the result of compiling and executing the expression f (2+4).

You might have experience with other languages whose implementations support a similar kind of interactive top-level loop. For example, most implementations of the Lisp, Scheme, and Basic languages support top-level loops. If you have experience with any of these languages, then you might expect that re-defining a function will change the binding of the function name, as well as any other functions that call that function. However, in the MLWorks system, this is not the case. For example, suppose we wish to change the definition of the inc function, so that it increments by two instead of one:

MLWorks> fun inc x = x + 2;
    val inc = fn : int -> int

In typical Lisp and Scheme systems, such a re-definition would cause the function f to change as well, since f calls inc. But in the MLWorks system, f's binding does not change, so in fact referring to f now still yields the original function:

MLWorks> f (2+4);
    val it = 35 : int

To understand why the MLWorks system behaves in this way, consider what would happen if we re-defined inc so that it had a type different than int -> int, for example:

MLWorks> fun inc x = (x mod 2 = 0);
    val inc = fn : int -> bool

Here, inc has been changed to a function that returns true if and only if its integer argument is even. Now, if f should also be changed to reflect this re-definition (as it would be in Lisp and Scheme systems), it would fail to typecheck. This is not necessarily a bad thing, but at any rate the MLWorks system does not bother to go back to earlier top-level declarations and re-elaborate them; hence, f's binding is left unchanged.

If you are already familiar with the SML language, then you can think of the sequence of top-level declarations typed into an MLWorks interactive top-level loop as being in nested let-bindings:

let fun inc x = x + 1 in
  let fun f n = (inc n) * 5 in
    let fun inc x = x + 2 in
      ...

[ Back to the Table of Contents ]


Using Files

Instead of typing your program into the interactive top-level, it is more productive to put your program into a file (or set of files) and then load it (them) into the MLWorks system. The simplest way to do this is to use the built-in function use. For example:

MLWorks> use "myprog.sml";
    val it = () : unit
    Use:  myprog.sml
    ...

The use function takes the name of the file (of type string) to load. If the file exists, it is opened and read, with each top-level declaration in the file processed in turn (and the results printed on the standard output). The "result" of the use function is the unit value ("()").

For those who prefer clicking than typing, the use function can also be invoked from the menu File>Use file...; a file dialog will appear and allow you to choose the file to use.

[ Back to the Table of Contents ]


Editing Files Using Emacs

I recommend using Emacs to edit your SML programs and also to manage interaction with the MLWorks system. To do this, select Emacs Server from the menu Preferences>Editor..., and include the following lines into your .emacs file

(setq load-path
      (cons "/afs/andrew/scs/cs/mlworks/ultra/lib/emacs/lisp/"
            load-path))

(autoload 'mlworks-server "mlworks-server" "The MLWorks server" t)
(autoload 'sml-mode "sml-mode" "Major mode for editing Standard ML programs." t)

(setq auto-mode-alist (cons '("\\.sml$" . sml-mode) auto-mode-alist))

Then, start emacs and type Meta-x mlworks-server. The error manager will then communicate with your emacs session and locate errors directly in the source file. The above emacs lisp code is available on the Andrew file system at

/afs/andrew/scs/cs/mlworks/ultra/lib/emacs/sample.emacs.el

The above commands also load the "sml mode", a special editing mode will be invoked any time you edit a file with an appropriate extension (such as ".sml"; other extensions can be specified in the init.el file). As in other special editing modes, using the Tab key or Control-j will cause emacs to attempt to indent your code in a pleasing way. Control-c followed by Tab will indent the current region. Since SML's syntax is rather complex, the sml mode indentation can be rather haphazard at times. Still, many people find it to be quite useful. A particularly useful key combination is "Meta" along with a vertical bar ("|"); this creates a template for an arm of a case expression or clause of a function. There are several other useful emacs commands for interacting with the inferior sml shell. You can find documentation for them by hitting Control-h m. Some of the most basic commands are

C-cC-l save the current buffer and then "use" the file
C-cC-r send the current region to the sml shell
C-c` find the next error message and position the cursor on the corresponding line in the source file
C-cC-s split the screen and show the sml shell

Other editors can be used in conjunction with MLWorks. Consult the MLWorks User Guide for details.

[ Back to the Table of Contents ]


Making Sense of Error Messages

As with most compilers, the MLWorks system oftens produce error messages that can be hard to decipher. The problem is compounded by the fact that SML supports polymorphic type inference, which makes it very difficult for the compiler to figure out precisely the real source of a type error. On the other hand, once all of the compile-time type errors are removed, it is often the case that the bulk of the bugs have already been stamped out. In practice, SML programs often work the first time, once all of the type errors reported by the compiler have been removed!

MLWorks displays the error messages in a dedicated window (the error browser) with often intelligible messages. If the error was present in a file and the interaction with the editor has been set up properly, clicking on the Action>edit menu item will highlight the (approximate) location of the error in the source file. More about errors and error handling can be ound in the User's Guide.

Type mismatches

The most common kind of error is the simple type mismatch. For example, suppose we have the following code in a file called myprog.sml:

fun inc x = x + 1
fun f n = inc true

Notice that a semi-colon is not needed here, since the end-of-file marker will serve the same purpose. Now, if we load this file, we get the following error message:

use "myprog.sml";
    myprog.sml:2,11-2,18 error: function applied to argument of wrong type
    Near: inc true
      Required argument type: int
      Actual argument type: bool
        Type clash between
          int
        and
          bool

The error message indicates that the expression inc true, on line 2, between columns 11 and 18, is guilty of a type mismatch. The function inc is being applied to an argument of type bool in this expression, but its domain (argument type) is int. Selecting Action>edit, or double-clicking on the error in the upper part of the error browser window will locate the cursor at the right position in your file and highlight the faulty term.

Errors in derived forms

To see a simple example of how error messages aren't always so illuminating, consider the following code:

fun fact 0 = 1
  | fact n = n * fact true

Here, we have attempted to define the factorial function, but in the recursive call we have (stupidly) applied the fact function to the boolean value true instead of to the integer argument n-1. The error message given by the MLWorks system is as follows:

myprog.sml:1,5 to 2,26: error: Type mismatch in recursive value binding for fact
Near: fn 0 => ...
  Pattern type: bool -> int
  Expression type: int -> int
    Type clash between int and bool

Despite the fact that the error is "clearly" in the recursive call to fact, the message indicates that the error is somewhere between line one and line two ­- this is the entire program! Another confusing aspect of this error message is that the function declaration is printed out in a form that does not closely resemble our original program. This is because many of SML's constructs are "derived forms," in other words, essentially macros that expand into a more basic "core" syntax. The MLWorks system always prints out code in terms of the core language, never the derived forms.

Unresolved overloading

Some of the arithmetic operators, such as +, *, -, = , and so on, are "overloaded", in the sense that they can be used with either integer arguments or real arguments. This overloading feature leads to possible source of confusion for the novice SML programmer. Consider, for example, the following declaration of a function for squaring numbers:

fun square x = x * x

The response from MLWorks is:

val square : int -> int = fn

MLWorks assumes that the * is for integers. In other SML compilers, such as the Standard ML of New Jersey, the resulting error message would be:

myprog.sml:1.18 Error: overloaded variable not defined at type
symbol: *
type: 'Z

Because there is not enough information in this program to determine whether the * is for integers or for reals, an error message is generated to complain about the inability to "resolve" the overloading.

The simple fix for this kind of error is simply to declare the type of one of the arguments to (or the result of) the arithmetic operation. For example, here are three versions that work:

fun square' x = x * x : int
fun square'' (x : int) = x * x
fun square''' x : int = x * x

The first version explicitly declares the type of the second argument to the * operator. The second version declares the type of the argument. Finally, the third version declares the type of the result of the square''' function. All three versions allow the SML type inference mechanism to infer the types of the identifiers in the declarations.

It is not uncommon to spend quite a long time tracking down the source of a type error. (Actually, the time spent doing this is almost always much less than the time it takes to track down the same error without the benefit of static typechecking!) A common way to narrow down the possibilities, and also to improve the precision of the error messages produced by the compiler, is to annotate the program with explicit types, in the way that we have done above. It is particularly helpful to annotate the types of function parameters, as we have done in square'' above. This is similar to the declaration of parameter types in languages such as C and Pascal. Of course, in those languages the declarations are required; in SML they are optional.

The value restriction

One of the most fundamental changes in the 1997 revision of the SML language is that it now enforces something called the value restriction. Essentially, this restricts polymorphism to expressions that clearly are values, specifically single identifiers and functions. When this restriction is violated, the error message, "nongeneric type variable," is given. For example, the following program results in this error:

fun id x = x

fun map f nil = nil
  | map f (h::t) = (f h) :: (map f t)

val f = map id

The message given is

myprog.sml:6,5:  error: Free type variable 'a in 'a list -> 'a list at top level

which indicates that the expression map id is polymorphic, but not syntactically a value (that is, not an identifier or lambda expression), and hence the attempt to use it as a polymorphic value (by binding f to it) violates the value restriction. The reasons for this restriction are beyond the scope of this document, but are explained in several papers as well as the textbook by Paulson.

In some cases the compiler can determine from context that an expression like map id that appears polymorphic can be given a non-polymorphic type. In this case the compiler does not report an error. For example,

let val x = ref [] in
  x := [3]
end

is accepted as a correct program.

Syntax errors

Because the syntax of SML is rather complex, there are several common errors that novices tend to make. One of the most common has to do with the syntax of patterns in clausal-form function declarations and case expressions. Consider the following code:

datatype 'a btree = Leaf of 'a
                  | Node of 'a btree * 'a btree
fun preorder Leaf(v) = [v]
  | preorder Node(l,r) = preorder l @ preorder r

The MLWorks system complains vigorously over this:

myprog.sml:4,14 to 4,17: error: Value constructor Leaf used without argument in pattern
myprog.sml:5,14 to 5,17: error: Value constructor Node used without argument in pattern
myprog.sml:4,5 to 5,48: error: Type mismatch in recursive value binding for preorder
  Near: fn _id216 => ...
    Pattern type:    'a -> ('a * 'a) list
    Expression type: 'a -> 'a * 'a -> ('a * 'a) list
      Type clash between
        ('a * 'a) list
      and
        'a * 'a -> ('a * 'a) list

The problem here is that Leaf and Node are patterns that are syntactically separate from, respectively, the (v) and (l,r) patterns. The (admittedly strange) syntax of SML requires extra parenthesization:

fun preorder (Leaf v) = [v]
  | preorder (Node(l,r)) = preorder l @ preorder r

This is true in all contexts where patterns are used, including clausal-form function declarations, case expressions, and exception handlers.

Another rather confusing part of the syntax has to do with the interaction between case expressions, exception handlers, and clausal-form function declarations. Consider the following function, taken in slightly modified form from the MLWorks library (which is described later):

datatype 'a option = NONE | SOME of 'a
fun filter pred l =
      let fun filterP (x::r, l) =
                case (pred x) of
                   SOME y => filterP(r, y::l)
                 | NONE => filterP(r, l)
            | filterP ([], l) = rev l
      in
        filterP (l, [])
      end

In this example, the local function filterP is defined in two clauses, the first handling the case of a non-empty list argument, and the second handling the empty list. In the first clause, a case expression is used. The syntactic ambiguity arises from the fact that it takes too much ``lookahead'' to figure out whether or not the second clause of filterP is actually the third arm of the case expression. This leads to the following rather cryptic error message:

myprog.sml:8,11 to 8,25: error: Non-constructor filterP used in pattern
myprog.sml:8,27: error: Unexpected `=', inserting `=>'
myprog.sml:8,11 to 8,25: error: Non-constructor filterP used in pattern
myprog.sml:8,27: error: Reserved word `op' required before infix identifier `='

As before, parenthesization fixes the problem:

fun filter pred l =
      let fun filterP (x::r, l) =
                (case (pred x) of
                    SOME y => filterP(r, y::l)
                  | NONE => filterP(r, l))
            | filterP ([], l) = rev l
      in
        filterP (l, [])
      end

Alternatively, in this example we can also exchange the two clauses of filterP:

fun filter pred l =
      let fun filterP ([], l) = rev l
            | filterP (x::r, l) =
                case (pred x) of
                   SOME y => filterP(r, y::l)
                 | NONE => filterP(r, l)
      in
        filterP (l, [])
      end

As with many programming languages, the basic advice to follow is: When in doubt, parenthesize.

[ Back to the Table of Contents ]


Further Issues

MLWorks contains numerous commands and options beyond the scope of this document. We suggest a careful reading of the User's Guide for getting acquainted with issues such as debugging, tracing, etc.

There is a reference manual for MLWorks available at URL:

http://www.cs.cmu.edu/afs/andrew/scs/cs/mlworks/doc/reference/html/index.htm

The reference manual includes a detailed discussion of the MLWorks libraries. A PostScript version of the reference manual is available as

/afs/andrew/scs/cs/mlworks/doc/reference/ps/reference-1-0.ps

[ Back to the Table of Contents ]


petel@cs.cmu.edu