Sample Code for this Chapter

A function may bind more than one argument by using a pattern, rather than a variable, in the argument position.  Function expressions may have the form

fn pat => exp

where pat is a pattern and exp is an expression.  Application of such a function proceeds much as before, except that the argument value is matched against the parameter pattern to determine the bindings of zero or more variables, which are then used during the evaluation of the body of the function.

For example, we may make the following definition of the Euclidean distance function:

val dist : real * real -> real = fn (x:real, y:real) => sqrt (x*x + y*y)

This function may then be applied to a pair (two-tuple!) of arguments to yield the distance between them.  For example, dist (2.0,3.0) evaluates to (approximately) 4.0.

Using fun notation, the distance function may be defined more concisely as follows:

fun dist (x:real, y:real):real = sqrt (x*x + y*y)

The meaning is the same as the more verbose val binding given earlier.

Keyword parameter passing is supported through the use of record patterns.  For example, we may define the distance function using keyword parameters as follows:

fun dist' {x=x:real, y=y:real} = sqrt (x*x + y*y)

The expression dist' {x=2.0,y=3.0} invokes this function with the indicated x and y values.

Functions with multiple results may be thought of as functions yielding tuples (or records).  For example, we might compute two different notions of distance between two points at once as follows:

fun dist2 (x:real, y:real):real*real = (sqrt (x*x+y*y), abs(x-y))

Notice that the result type is a pair, which may be thought of as two results.

These examples illustrate a pleasing regularity in the design of ML.  Rather than introduce ad hoc notions such as multiple arguments, multiple results, or keyword parameters, we make use of the general mechanisms of tuples, records, and pattern matching.

It is sometimes useful to have a function to select a particular component from a tuple or record (e.g., the third component or the component labeled url).   Such functions may be easily defined using pattern matching.  But since they arise so frequently, they are pre-defined in ML using sharp notation.  For any record type typ1*...*typn, and each i between 1 and n, there is a function #i of type typ1*...*typn->typi defined as follows:

fun #i (_, ..., _, x, _, ..., _) = x

where x occurs in the ith position of the tuple (and there are underscores in the other n-1 positions).  Thus we may refer to the second field of a three-tuple val by writing #2 val.  It is bad style, however, to over-use the sharp notation; code is generally clearer and easier to maintain if you use patterns wherever possible.  Compare, for example, the following definition of the Euclidean distance function written using sharp notation with the original.

fun dist (p:real*real):real = sqrt((#1 p)*(#1 p)+(#2 p)*(#2 p))

You can easily see that this gets out of hand very quickly, leading to unreadable code.  Use of the sharp notation is strongly discouraged!

A similar notation is provided for record field selection.  The following function #lab selects the component of a record with label lab.

fun #lab {lab=x,...} = x

Notice the use of ellipsis!  Bear in mind the disambiguation requirement: any use of #lab must be in a context sufficient to determine the full record type of its argument.

Tuple types have the property that all values of that type have the same form (n-tuples, for some n determined by the type); they are said to be homogeneous.  For example, all values of type int*real are pairs whose first component is an integer and whose second component is a real.  Any type-correct pattern will match any value of that type; there is no possibility of failure of pattern matching.  The pattern (x:int,y:real) is of type int*real and hence will match any value of that type.  On the other hand the pattern (x:int,y:real,z:string) is of type int*real*string and cannot be used to match against values of type int*real; attempting to do so fails at compile time.

Other types have values of more than one form; they are said to be heterogeneous types.  For example, a value of type int might be 0, 1, ~1, ... or a value of type char might be #"a" or #"z".  (Other examples of heterogeneous types will arise later on.)  Corresponding to each of the values of these types is a pattern that matches only that value.  Attempting to match any other value against that pattern fails at execution time.  For the time being we will think of match failure as a fatal run-time error, but later on we will see that the extent of the failure can be controlled.

Here are some simple examples of pattern-matching against values of a heterogeneous type:

val 0 = 1-1
val (0,x) = (1-1, 34)
val (0, #"0") = (2-1, #"0")

The first two bindings succeed, the third fails.  In the case of the second, the variable x is bound to 34 after the match.  No variables are bound in the first or third examples.

The importance of constant patterns becomes clearer once we consider how to define functions over heterogeneous types.  This is achieved in ML using a clausal function definition.   The general form of a function is

fn pat1 => exp1 | ... | patn => expn

where each pati is a pattern and each expi is an expression involving the variables of the pattern patiEach component pat => exp is called a clause or rule; the entire assembly of rules is called a match.

The typing rules for matches ensure consistency of the clauses.  Specifically,

  1. Each pattern in the match must have the same type typ.
  2. Each expression in the match must have the same type typ', given the types of the variables in the patterns.

The type of a function whose body is a match satisfying these requirements is typ->typ'.   Note that there is no requirement that typ and typ' coincide!

Application of functions with multiple clauses to a value val proceeds by considering each clause in the order written.  At stage i the argument value val is matched against the pattern pati; if the pattern match succeeds, evaluation continues with the evaluation of expression expi, with the variables replaced by the values determined by the pattern matching process.   Otherwise we proceed to stage i+1.  If no pattern matches (i.e., we reach stage n+1), then the application fails with an execution error.   Here's an example.

val recip : int -> int = fn 0 => 0 | n:int => 1 div n

This defines an integer-valued reciprocal function on the integers, where the reciprocal of 0 is arbitrarily defined to be 0.  The function has two clauses, one for the argument 0, the other for non-zero arguments n.  (Note that n is guaranteed to be non-zero because the patterns are considered in order: we reach the pattern n:int only if the argument fails to match the pattern 0.)

Using fun notation we may define recip as follows:

fun recip 0 = 0
  | recip (n:int) = 1 div n

One annoying thing to watch out for is that the "fun" form uses an equal sign to separate the pattern from the expression in a clause, whereas the "fn" form uses an arrow.

Heterogeneous types abound.  Perhaps the most fundamental one is the type bool of booleans.  Its values are true and false.   Functions may be defined on booleans using clausal definitions that dispatch on true and false.  For example, the negation function is defined clausally as follows:

fun not true = false
  | not false = true

In fact, this function is pre-defined in ML.

Case analysis on the values of a heterogeneous type is performed by application of a clausally-defined function.  The notation

case exp of pat1 => exp1 | ... | patn => expn

is short for the application

(fn pat1 => exp1 | ... | patn => expn) exp.

Evaluation proceeds by first evaluating exp, then matching its value successively against the patterns in the match until one succeeds, and continuing with evaluation of the corresponding expression.  The case expression fails if no pattern succeeds to match the value.

The conditional expression

if exp then exp1 else exp2

is short-hand for the case analysis

case exp of true => exp1 | false => exp2

which is itself short-hand for the application

(fn true => exp1 | false => exp2) exp.

The "short-circuit" conjunction and disjunction operations are defined as follows.  The expression exp1 andalso exp2 is short for if exp1 then exp2 else false and the expression exp1 orelse exp2 is short for if exp1 then true else exp2.  You should expand these into case expressions and check that they behave as expected.  Pay particular attention to the evaluation order, and observe that the call-by-value principle is not violated by these expressions.

Conceptually, equality and comparison operations on the types int, char, and string are defined by infinite (or, at any rate, enormously large) matches, but in practice they are built into the language as primitives.  (The ordering on char and string are the lexicographic orderings.)  Thus we may write

fun is_alpha c:char =
    (#"a" <= c andalso c <= #"z") orelse (#"A" <= c andalso c <= #"Z")

to test for alphabetic characters.

All this talk of success and failure of pattern matching brings up the issue of exhaustiveness and redundancy in a match.  A clause in a match is redundant if any value matching its pattern must have matched the pattern of a preceding clause in the match.  A redundant rule can never be reached during execution.  It is always an error to have a redundant clause in a match.  Redundant clauses often arise accidentally.  For example, the second clause of the following function definition is redundant for annoyingly subtle reasons:

fun not True = false
  | not false = true

The mistake is to have capitalized True so that it is no longer the boolean-typed constant pattern, but is rather a variable that matches any value of Boolean type.  Hence the second clause is redundant.  Reversing the order of clauses can also lead to redundancy, as in the following mistaken definition of recip:

fun recip (n:int) = 1 div n
  | recip 0 = 0

The second clause is redundant because the first clause will always match any integer value, including 0.

A match (as a whole) is exhaustive if every possible value of the domain type of the match must match some clause of that match.  In other words, pattern matching against an exhaustive pattern cannot fail at run-time.  The clauses in the (original) definition of recip are exhaustive because they cover every possible integer value.  The match comprising the body of the following function is not exhaustive:

fun is_numeric #"0" = true
  | is_numeric #"1" = true
  | is_numeric #"2" = true
  | is_numeric #"3" = true
  | is_numeric #"4" = true
  | is_numeric #"5" = true
  | is_numeric #"6" = true
  | is_numeric #"7" = true
  | is_numeric #"8" = true
  | is_numeric #"9" = true

When applied to, say, #"a", this function fails.

It is often, but not always, an error to have an inexhaustive match.  The reason is that the type system cannot record many invariants (such as the property that is_numeric is only called with a character representing a decimal digit), and hence the compiler will issue a warning about inexhaustive matches.  It is a good idea to check each such warning to ensure that you have not accidentally omitted a clause from the match.

Any match can be made exhaustive by the inclusion of a catch-all clause of the form

_ => exp

where exp is an expression of the appropriate type.  It is a bad idea to simply stick such a clause at the end of every match in order to eliminate "inexhaustive pattern" warnings.  By doing so you give up the possibility that the compiler may warn you of a legitimate error (due to a forgotten case) in your program.  The compiler is your friend!  Use it to your advantage!

Sample Code for this Chapter