A characteristic feature of ML is the ease with which we may handle *aggregate data
structures* such as tuples, arrays, lists, and trees. The simplest form of
aggregate is the *tuple*, value of *product *type. Product types have
the form

typ_{1}`*`

...`*`

typ_{n},

where *n* is at least 2. Values of this type are *n-tuples* of the form

`(`

val_{1}`,`

...`,`

val_{n}`)`

,

where *val _{i}* is a value of type

Thus the following are well-formed bindings:

`val pair : int * int = (2, 3)`

val triple : int * real * string = (2, 2.0, "2")

val pair_of_pairs : (int * int) * (real * real) = ((2,3),(2.0,3.0))

val quadruple : int * int * real * real = (2,3,2.0,3.0)

The nesting of parentheses matters! A pair of pairs is not the same as a quadruple, so the last two bindings are of distinct values with distinct types.

More generally, a *tuple expression* has the form

`(`

exp_{1}`,`

...`,`

exp_{n}`)`

,

where each *exp _{i} *is an expression (not necessarily a value).
Evaluation of tuple expressions proceeds

`(`

`,`

...`,`

`)`

,
where each

`val pair : int * int = (1+1, 5-2)`

binds the value `(2, 3)`

to the variable `pair`

.

Tuples may be decomposed into their constituent parts using *pattern matching*.
This is expressed using a generalized form of value binding in which the left-hand
side is not merely a variable, but a *pattern* involving zero or more variables.
The general form of a value binding is

`val`

pat`=`

exp,

where *pat *is a *pattern* and *exp *is any expression.

What sorts of patterns are there? We've already seen the basic form of pattern,
namely a *variable pattern*, written *var*`:`

*typ.*
Another form of pattern is the *tuple pattern*, which has the form `(`

*pat _{1}*

`,`

...`,`

`)`

,
where each Just as every expression must have a type, so must every pattern. The type of a
pattern is determined by a rule governing each form of pattern. The variable pattern
*var*`:`

*typ *is of type *typ*, and the tuple pattern `(`

*pat _{1}*

`,`

...`,`

`)`

`*`

...`*`

typ`(n:int,r:real,s:string)`

is of type `int*real*string`

,
as might be expected.A value binding of the form `val `

*pat** *`=`

* **exp
*is well-typed iff *pat* and *exp *have the same type; otherwise the
binding is ill-typed and is rejected by the compiler. Thus the following bindings
are well-typed (given the bindings above):

`val (m:int, n:int) = pair`

val (m:int, r:real, s:string) = triple

val ((m:int,n:int), (r:real, s:real)) = pair_of_pairs

val (m:int, n:int, r:real, s:real) = quadruple

In contrast, the following are ill-typed:

`val (m:int,n:int,r:real,s:real) = pair_of_pairs`

val (m:int, r:real) = pair

val (m:int, r:real) = triple

Value bindings are evaluated using the *bind-by-value* principle discussed
earlier, except that the binding process is now more complex than before. First, we
evaluate the right-hand side of the binding to a value (if indeed it has one). Then,
we proceed according to the rules of *pattern matching* to determine the bindings
for the individual variables in the pattern. This process is quite intuitive.
For example, the binding

`val (m:int,r:real,s:string) = triple`

binds `m`

to `2`

, `r`

to `2.0`

, and `s`

to `"2.0"`

.

Formally, we go through a process of reduction to atomic value bindings, where an atomic binding is one whose pattern is a variable pattern. The binding

`val (`

pat_{1}`,`

...`,`

pat_{n}`) = (`

val_{1}`,`

...`,`

val_{n}`)`

reduces to the sequence of bindings

`val`

pat_{1}`=`

val_{1}`...`

valpat_{n}`=`

val_{n}

This decomposition is repeated until all bindings are atomic, at which point the
process terminates having arrived at the value environment determined by the original
binding. Notice that we rely on the fact that values of *n-*tuple type are *n*-tuples!

For example, the evaluation of the binding

`val ((m:int,n:int), (r:real, s:real)) = pair_of_pairs`

proceeds by first evaluating the expression `pair_of_pairs`

to `((2,3),(2.0,3.0))`

,
then decomposing the pattern `((m:int,n:int), (r:real, s:real))`

in two major
stages, as follows:

- Reduce the binding
`val ((m:int,n:int), (r:real, s:real)) = ((2,3),(2.0,3.0))`

to the sequence of bindings

`val (m:int, n:int) = (2,3)`

.

val (r:real, s:real) = (2.0,3.0) - Reduce the latter two bindings to the sequence of atomic bindings
`val m:int = 2`

val n:int = 3

val r:real = 2.0

val s:real = 3.0

At this point we have determined the bindings for the individual variables in the pattern.

The *null tuple* is a tuple with zero elements. It is written `()`

,
which is consistent with the *n*-tuple notation. Its type, however, is
written `unit`

, indicating that it is has but a single element. The
null-tuple pattern is, of course, also written `()`

. Aside from
regularity, the main reason for having a null tuple in the language is to provide a
"default" value for expressions that have no interesting value (but, presumably,
an interesting effect). We'll have more to say about this later in these notes.

When tuples get large, it gets hard to remember which position is which. *Records*
are tuples whose components are *labeled* with an identifier. A *record
type* has the form

`{`

lab_{1}`:`

typ_{1}`,`

...`,`

lab_{n}`:`

typ_{n}`}`

,

where *n* is at least 2. A *record value* has the form

`{`

lab_{1}`=`

val_{1}`,`

...`,`

lab_{n}`=`

val_{n}`}`

,

where *val _{i} *has type

`{`

lab_{1}`=`

pat_{1}`,`

...`,`

lab_{n}`=`

pat_{n}`}`

.

This pattern has type `{`

*lab _{1}*

`:`

`,`

...`,`

`:`

`}`

provided that Some examples will help clarify the use of record types.

`type hyperlink = { protocol : string, address : string, display : string }`

val mailto_rwh : hyperlink =

{ protocol="mailto", address="rwh@cs.cmu.edu", display="Robert Harper" }

val plcore_home : hyperlink =

{ protocol="http", address="//cs.cmu.edu/~rwh/plcore", display="Programming Languages Core Course" }

val { protocol=port, address=addr, display=disp } = mailto_rwh

(The over-use of strings here is quite obvious; in due course we'll have sufficient mechanism to do a better job.)

In practice one often wishes to select only one or two fields from a tuple or record
value, the others being irrelevant to the computation at hand. It would be tedious
in the extreme to be forced to bind a variable to each of possibly dozens of irrelevant
fields, just so that you could access one of them. *Wild card patterns* are
used to handle these situations. The basic form of wild card is written as an
underscore, `_`

. It is an atomic pattern that does not generate any
bindings; wild card bindings are simply eliminated (after evaluation of the right-hand
side).

`val (m:int, _, r:real, _) = quadruple`

val (_, (x:real, y:real)) = pair_of_pairs

val { protocol=port, address=_, display=_ } = mailto_rwh

In each case we have elided certain fields using the wild card pattern. The
matching process proceeds as before, including* evaluation of the right-hand side of
the binding*, but bindings whose pattern is the wild card are dropped. For
example, the first binding above generates in one step the bindings

`val m:int = 2`

val _ = 3

val r:real = 2.0

val _ = 3.0

At the next step the bindings for the wild card are dropped, yielding bindings for `m`

and `r`

alone.

It is important to remember that the right-hand side of a binding is *always*
evaluated, regardless of the use of wild card patterns! Thus a binding of the form `val`

`_`

`=`

*exp* always leads to the evaluation of *exp*,
but then its value is thrown away. (This could be useful when *exp* has an
effect, as we'll see much later in these notes.)

You will by now have asked yourself "what is the type of a wild card
pattern?". Good question. The answer is: *whatever type is necessary
to ensure that the overall binding is well-typed.* This is undoubtedly not a
fully satisfying answer, because it doesn't tell you how this information is determined.
We will have more to say on this when we discuss type inference below.

It is quite common to encounter record types with tens of fields. In such cases
even the wild card notation doesn't help much when it comes to selecting one or two fields
from such a record. For this we often use *ellipsis patterns* in records, as
illustrated by the following example.

`val { protocol = port, ... } = plcore_home`

The pattern `{ protocol = port, ... } `

stands for the pattern ```
{
protocol=port, address=_, display=_ }
```

used earlier. In effect the compiler
replaces the ellipsis with however many wild card entries are required in order to
complete the record pattern. In order for this to occur *the compiler must be
able to determine unambiguously the type of the record pattern*. Here the
right-hand side of the value binding determines the type of the pattern, which then
determines which additional fields to fill in. In some situations the context does
not disambiguate, in which case you must supply additional type information or eschew the
use of ellipsis.