Sample Code for this Chapter

A characteristic feature of ML is the ease with which we may handle aggregate data structures such as tuples, arrays, lists, and trees.  The simplest form of aggregate is the tuple, value of product type.  Product types have the form

typ1*...*typn,

where n is at least 2. Values of this type are n-tuples of the form

(val1,...,valn),

where vali is a value of type typi (for each 1<=i<=n).

Thus the following are well-formed bindings:

val pair : int * int = (2, 3)
val triple : int * real * string = (2, 2.0, "2")
val pair_of_pairs : (int * int) * (real * real) = ((2,3),(2.0,3.0))
val quadruple : int * int * real * real = (2,3,2.0,3.0)

The nesting of parentheses matters!  A pair of pairs is not the same as a quadruple, so the last two bindings are of distinct values with distinct types.

More generally, a tuple expression has the form

(exp1,...,expn),

where each expi is an expression (not necessarily a value).   Evaluation of tuple expressions proceeds from left to right, yielding the tuple value (val1,...,valn), where each expi evaluates to vali (for each 1<=i<=n).   Thus the binding

val pair : int * int = (1+1, 5-2)

binds the value (2, 3) to the variable pair.

Tuples may be decomposed into their constituent parts using pattern matching.   This is expressed using a generalized form of value binding in which the left-hand side is not merely a variable, but a pattern involving zero or more variables.   The general form of a value binding is

val pat = exp,

where pat is a pattern and exp is any expression.

What sorts of patterns are there?  We've already seen the basic form of pattern, namely a variable pattern, written var:typ.  Another form of pattern is the tuple pattern, which has the form (pat1,...,patn), where each pati is a pattern.  We will introduce other forms of pattern later in these notes.

Just as every expression must have a type, so must every pattern.  The type of a pattern is determined by a rule governing each form of pattern.  The variable pattern var:typ is of type typ, and the tuple pattern (pat1,...,patn) is of type typ1*...*typn, where pati is a pattern of type typi for each i.   Thus the pattern (n:int,r:real,s:string) is of type int*real*string, as might be expected.

A value binding of the form val pat = exp is well-typed iff pat and exp have the same type; otherwise the binding is ill-typed and is rejected by the compiler.  Thus the following bindings are well-typed (given the bindings above):

val (m:int, n:int) = pair
val (m:int, r:real, s:string) = triple
val ((m:int,n:int), (r:real, s:real)) = pair_of_pairs
val (m:int, n:int, r:real, s:real) = quadruple

In contrast, the following are ill-typed:

val (m:int,n:int,r:real,s:real) = pair_of_pairs
val (m:int, r:real) = pair
val (m:int, r:real) = triple

Value bindings are evaluated using the bind-by-value principle discussed earlier, except that the binding process is now more complex than before.  First, we evaluate the right-hand side of the binding to a value (if indeed it has one).  Then, we proceed according to the rules of pattern matching to determine the bindings for the individual variables in the pattern.  This process is quite intuitive.   For example, the binding

val (m:int,r:real,s:string) = triple

binds m to 2, r to 2.0, and s to "2.0".

Formally, we go through a process of reduction to atomic value bindings, where an atomic binding is one whose pattern is a variable pattern.  The binding

val (pat1,...,patn) = (val1,...,valn)

reduces to the sequence of bindings

val pat1 = val1
...
val
patn = valn

This decomposition is repeated until all bindings are atomic, at which point the process terminates having arrived at the value environment determined by the original binding.  Notice that we rely on the fact that values of n-tuple type are n-tuples!

For example, the evaluation of the binding

val ((m:int,n:int), (r:real, s:real)) = pair_of_pairs

proceeds by first evaluating the expression pair_of_pairs to ((2,3),(2.0,3.0)), then decomposing the pattern ((m:int,n:int), (r:real, s:real)) in two major stages, as follows:

  1. Reduce the binding

    val ((m:int,n:int), (r:real, s:real)) = ((2,3),(2.0,3.0))

    to the sequence of bindings

    val (m:int, n:int) = (2,3)
    val (r:real, s:real) = (2.0,3.0)
    .

  2. Reduce the latter two bindings to the sequence of atomic bindings

    val m:int = 2
    val n:int = 3
    val r:real = 2.0
    val s:real = 3.0

At this point we have determined the bindings for the individual variables in the pattern.

The null tuple is a tuple with zero elements.  It is written (), which is consistent with the n-tuple notation.  Its type, however, is written unit, indicating that it is has but a single element.  The null-tuple pattern is, of course, also written ().  Aside from regularity, the main reason for having a null tuple in the language is to provide a "default" value for expressions that have no interesting value (but, presumably, an interesting effect).  We'll have more to say about this later in these notes.

When tuples get large, it gets hard to remember which position is which.  Records are tuples whose components are labeled with an identifier.  A record type has the form

{lab1:typ1,...,labn:typn},

where n is at least 2.  A record value has the form

{lab1=val1,...,labn=valn},

where vali has type typi.  A record pattern has the form

{lab1=pat1,...,labn=patn}.

This pattern has type {lab1:typ1,...,labn:typn} provided that pati has type typi for each i.   The important thing to note about record expressions is that the order of the fields determines the order of evaluation, but that for record values, the order of the fields is irrelevant.  Once the fields have been evaluated, you can write them in any order you like, but the compiler will adhere to the order you choose to write unevaluated fields.

Some examples will help clarify the use of record types.

type hyperlink = { protocol : string, address : string, display : string }

val mailto_rwh : hyperlink =
    { protocol="mailto", address="rwh@cs.cmu.edu", display="Robert Harper" }
val plcore_home : hyperlink =
    { protocol="http", address="//cs.cmu.edu/~rwh/plcore", display="Programming Languages Core Course" }

val { protocol=port, address=addr, display=disp } = mailto_rwh

(The over-use of strings here is quite obvious; in due course we'll have sufficient mechanism to do a better job.)

In practice one often wishes to select only one or two fields from a tuple or record value, the others being irrelevant to the computation at hand.  It would be tedious in the extreme to be forced to bind a variable to each of possibly dozens of irrelevant fields, just so that you could access one of them.  Wild card patterns are used to handle these situations.  The basic form of wild card is written as an underscore, _.  It is an atomic pattern that does not generate any bindings; wild card bindings are simply eliminated (after evaluation of the right-hand side).

val (m:int, _, r:real, _) = quadruple
val (_, (x:real, y:real)) = pair_of_pairs
val { protocol=port, address=_, display=_ } = mailto_rwh

In each case we have elided certain fields using the wild card pattern.  The matching process proceeds as before, including evaluation of the right-hand side of the binding, but bindings whose pattern is the wild card are dropped.  For example, the first binding above generates in one step the bindings

val m:int = 2
val _ = 3
val r:real = 2.0
val _ = 3.0

At the next step the bindings for the wild card are dropped, yielding bindings for m and r alone.

It is important to remember that the right-hand side of a binding is always evaluated, regardless of the use of wild card patterns!  Thus a binding of the form val _ = exp always leads to the evaluation of exp, but then its value is thrown away.  (This could be useful when exp has an effect, as we'll see much later in these notes.)

You will by now have asked yourself "what is the type of a wild card pattern?".  Good question.  The answer is: whatever type is necessary to ensure that the overall binding is well-typed.  This is undoubtedly not a fully satisfying answer, because it doesn't tell you how this information is determined.   We will have more to say on this when we discuss type inference below.

It is quite common to encounter record types with tens of fields.  In such cases even the wild card notation doesn't help much when it comes to selecting one or two fields from such a record.  For this we often use ellipsis patterns in records, as illustrated by the following example.

val { protocol = port, ... } = plcore_home

The pattern { protocol = port, ... } stands for the pattern { protocol=port, address=_, display=_ } used earlier.  In effect the compiler replaces the ellipsis with however many wild card entries are required in order to complete the record pattern.  In order for this to occur the compiler must be able to determine unambiguously the type of the record pattern.  Here the right-hand side of the value binding determines the type of the pattern, which then determines which additional fields to fill in.  In some situations the context does not disambiguate, in which case you must supply additional type information or eschew the use of ellipsis.

Sample Code for this Chapter