The fundamental constructs of the ML module system are *signatures* and *structures*.
A signature may be thought of as an interface or specification of a structure, and
a structure may correspondingly be thought of as an implementation of a signature.
Many languages (such as Modula-2, Modula-3, Ada, or Java) have similar constructs:
signatures are analogous to interfaces or package specifications or class types, and
structures are analogous to implementations or packages or classes. One thing to
point out right away, though, is that the relationship between signatures and structures
in ML is *many-to-many*, whereas in some languages (such as Modula-2) the
relationship is *one-to-one* or *many-to-one*. This means that in ML a
signature may serve as the interface for many different structures, and that a structure
may implement many different signatures. This provides a considerable degree of
flexibility in the use (and re-use) of components in a system. The price we pay for
this flexibility is that we must be quite careful about referring to *the*
signature of a structure, since it can have more than one. As we will see, every
structure has a *most specific*, or *principal*, signature, with the
property that all other signatures for that structure are (in a suitable sense) more
restrictive than the principal signature.

The fundamental unit of modularity in ML is the *structure*. A structure
consists of a sequence of declarations comprising the *components* of the
structure. A structure may be bound to a *structure variable* using a *structure
binding*. The components of a structure are accessed using *long identifiers*,
or *paths*. A structure may also be *opened* to incorporate all of its
components into the environment.

Here's a simple example of a structure:

structure IntLT = struct

type t = int

val lt = (op <)

val eq = (op =)

end

This structure has three components, one type and two values, each of which are
functions. The type component is named `t`

and is bound to the type `int`

.
The value components are named `lt`

and `eq`

, and are bound
to the corresponding comparison operations on integers. This structure packages up
the type `int`

with the integer comparison operations `<`

and `=`

to form a module that is then bound to the structure variable `IntLT`

.

We may similarly package up the type `int`

with comparison operations being
divisibility and equality using the following binding:

structure IntDiv = struct

type t = int

fun lt (m, n) = (n mod m = 0)

val eq = (op =)

end

The structures and may be thought of as two different *interpretations*
of the type `int`

as an ordered type (*i.e.*, a type supporting a
"less than" and an equality operation). In one case we interpret
"less than" as the standard ordering on integers, in the other we interpret
"less than" as divisibility. The point is *the type does not determine
the interpretation*. We use the module system to package up types with
operations to provide an interpretation of that type. Many different interpretations
may co-exist, provided only that we bind them to distinct structure variables.

The components of a structure are accessed using *paths* (also known as *long
identifiers* or *qualified names*). We may only access the components of
a *named* structure (one that has been bound to a structure variable). A
component named *id* of a structure named *strid* is accessed by the long
name *strid .id*, the structure name followed by the component name,
separated by a "dot". For example,

`IntLT.lt`

designates the `lt`

operation of the structure `IntLT`

, and `IntDiv.lt`

designates the `lt`

operation of the structure `IntDiv`

. The type of `IntLT.lt`

is

`IntLT.t * IntLT.t -> bool`

,

and the type of `IntDiv.lt`

is

`IntDiv.t * IntDiv.t -> bool`

.

The types of these operations have been "externalized" using long identifiers
to refer to the appropriate type `t`

for each operation. Since `IntLT.t`

and `IntDiv.t`

are both bound to the type `int`

, it makes sense to
write expressions such as `IntLt.lt(3,4)`

and` IntDiv.lt(3,4)`

.

Since `IntLT.t`

and `IntDiv.lt`

are both bound to the type `int`

,
it is technically correct to consider `IntLt.t`

to be of type

IntDiv.t * IntDiv.t -> bool

and also of type

int * int -> bool.

Were we also to have a structure `StringLT`

whose `t`

component
is bound to the type `string`

, then `StringLT.lt`

would have type

StringLT.t * StringLT.t -> bool

and type

string * string -> bool

but not type

IntLT.t * IntLT.t -> bool

Packaging a declaration to form a structure does not affect the usual rules of type equivalence --- transparent type definitions remain transparent.

The use of a long identifier to access a component of a structure serves to remind us
of the interpretation of the underlying type of the structure. For example, the long
identifer `IntLT.lt`

reminds us that the comparison is the standard "less
than" relation on integers, whereas the long identifier `IntDiv.lt`

reminds us that the comparison is divisiblity. Sometimes the use of long identifiers
can get out of hand, cluttering the program text, rather than clarifying it. This
can be alleviated by *opening* the structure for use in a particular context.
For example, rather than writing

IntDiv.lt (

exp1,exp2) andalso IntDiv.eq (exp3,exp4)

we may instead write

let

open IntDiv

in

lt (exp1,exp2) andalso eq (exp3,exp4)

end

This has the effect of incorporating the components of the structure `IntLT`

into the environment for the duration of the evaluation of the body of the `let`

expression. It is as if we replace "`open IntLT`

" by the
declarations comprising the structure bound to `IntLT`

.

Using `open`

has some disadvantages. One is that we cannot
simultaneously open two structures that have one or more components with the same names
--- the one we open later we will shadow the bindings of the one we open earlier.
For example, if we write

let

open IntLT IntDiv (* open both structures in the order given *)

in

...

end

then only the bindings of the second structure, `IntDiv`

, are available in
the scope of the `let`

because they completely shadow the bindings of the
first structure, `IntLT`

.

Another disadvantage is that it is difficult to determine exactly which bindings are
introduced by an `open`

declaration. We must refer to the implementation
of the opened structure (typically defined somewhere remote from the client code) to
understand the effect of the open. A typical bug is to unwittingly shadow an
identifier by opening a structure that happens to provide a binding for that identifier,
even though we did not intend that it do so. In many cases this will result in a
typechecking error, but in more insidious cases it can lead to subtle run-time bugs.
For example, suppose the implementation of the structure makes use of an auxiliary
function as follows:

structure StringLT = struct

type t = string

fun compare (c, d) = Char.< (c, d)

fun lt (s, t) = ... compare ...

fun eq (s, t) = ... compare ...

end

Opening this structure introduces not only the expected components `t`

, `lt`

,
and `eq`

, but also the unexpected auxiliary function `compare`

!

To avoid such problems it is usually advisable to avoid `open`

entirely.
The typical compromise is to introduce a short (typically one letter) name for the
structures in question to minimize the clutter of a long path. Thus we might write

let

structure I = IntLT

in

I.lt (exp1,exp2) andalso I.eq (exp3,exp4)

end

rather than opening the structure `IntLT`

as suggested above.

The structures `IntLT`

and `IntDiv`

are rather simple
examples of the use of the module system. A more substantial example is provided by
packaging the implementation of (ephemeral) queues into a structure.

structure PersQueue = struct

type 'a queue = 'a list * 'a list

val empty = (nil, nil)

fun insert (x, (bs, fs)) = (x::bs, fs)

exception Empty

fun remove (nil, nil) = raise Empty

| remove (bs, f::fs) = (f, (bs, fs))

| remove (bs, nil) = remove (nil, rev bs)

end

The components of this structure may be accessed by using long identifiers,

val q = PersQueue.empty

val q' = PersQueue.insert (1, q)

val q'' = PersQueue.insert (2, q)

val (x'', _) = PersQueue.remove q'' (* 2 *)

val (x', _) = PersQueue.remove q' (* 1 *)

by opening the structure,

let

open PersQueue

in

insert (1, empty)

end

or by introducing a short name for it

let

structure PQ = PersQueue

in

PQ.insert (1, PQ.empty)

end

The structure `PersQueue`

may be thought of as an implementation of the
abstract data type of persistent queues. We may build and manipulate queues using
the operations `PersQueue.empty`

, `PersQueue.insert`

, and `PersQueue.remove`

.
Structures are loosely analogous to classes in languages such as C++ and Java; in
particular, abstract types are usually implemented by structures.

A *signature* is the type of a structure. It describes a structure by *specifying*
each of its components by giving its name and a description of it. Different sorts
of components have different specifications. A type component is specified by
giving its arity (number of arguments) and (optionally) its definition. A datatype
component is specified by its declaration, which defines its value constructors and their
types. An exception component is specified by giving the type of the values it
carries (if any). A value component is specified by giving its type scheme.

Here is the signature of an ordered type, one that comes equipped with a comparison operations on it.

signature ORDERED = sig

type t

val lt : t * t -> bool

val eq : t * t -> bool

end

This signature describes a structure that provides a type component named `t`

(with no specified definition) and two operations, `lt`

and `eq`

, of
type `t * t -> bool`

. Ordinarily we expect that `lt`

is
reflexive and transitive, and that `eq`

is an equivalence relation, but these
requirements are not formally expressible in ML.

If we wish we can specify the definition of a type component in a signature. For example, we may define the signature

signature INT_ORDERED = sig

type t = int

val lt : t * t -> bool

val eq : t * t -> bool

end

which is similar to the signature `ORDERED`

, except that the type component `t`

is specified to be equivalent to `int`

. It therefore describes only those
structures that provide an interpretation of `int`

as an ordered type.
(As we mentioned earler, there can be many such interpretations.)

An important consequence of having type definitions in signatures is that many
superficially different signatures are equivalent. For example, the signature `INT_ORDERED`

is equivalent to the following signature:

signature INT_ORDERED_VARIANT = sig

type t = int

val lt : int * int -> bool

val eq : int * int -> bool

end

The reason is that since the type component `t`

is defined to be `int`

,
we may replace it by `int`

anywhere that it is used to obtain an equivalent
signature. For all practical purposes the signatures `INT_ORDERED`

and `INT_ORDERED_VARIANT`

are indistinguishable from one another.

Here is a signature describing implementations of persistent queues:

signature QUEUE = sig

type 'a queue

val empty : 'a queue

val insert : 'a * 'a queue -> 'a queue

exception Empty

val remove : 'a queue -> 'a * 'a queue

end

This signature specifies that an implementation of persistent queues provide a
one-argument type constructor `'a queue`

, the type of queues containing values
of type `'a`

, an exception `Empty`

carrying no value, and the values
`empty`

, `insert`

, and `remove`

with types `'a queue`

,
`'a * 'a queue ->`

`'a queue`

, and `'a queue ->`

```
'a
* 'a queue
```

, respectively.

The *signature matching *relation is of central importance to the ML module
system. Signature matching governs the formation of complex structure expressions
in the same way that type matching governs the formation of core language expressions.
For example, to determine whether a structure binding `structure`

*strid*
`:`

*sigexp* `=`

*strexp *is well-formed, we must
check that the *principal *signature of *strexp* matches the *ascribed *signature
*sigexp*. The principal signature of a structure expression is the signature
that most accurately describes the structure *strexp*; it contains the definitions
of all of the types defined in *strexp*, and the types of all of its value
components. We then compare the principal signature of *strexp* against the* *signature
*sigexp *to determine whether or not *strexp* satisfies the requirements
specified by *sigexp*.

Signature matching consists of a comparison between a *candidate* and a *target*
signature. The target expresses a set of requirements that the candidate must
fulfill. In the case of a structure binding the candidate is the principal signature
of the structure expression, and the target is the ascribed signature of the binding.
Roughly speaking, to check that a candidate siganture matches a target signature it is
necessary to ensure that the following conditions hold:

- Every type specification in the target must have a matching type specification in the candidate. If the target specifies a definition for a type, so must the candidate specify an equivalent definition.
- Every exception specification in the target must have an equivalent exception specification in the candidate.
- Every value specification in the target must be matched by a value specification in the candidate with at least as general a type.

Note that the candidate signature may have *more components* than are required
by the target, may have *more definitions* of types than are required, and may have
value components with *more general* types. The target signature specifies a
set of necessary conditions that must be met by the candidate, but the candidate may well
be much richer than is required by the target.

To make these ideas precise, we decompose the signature matching relation into two
sub-relations, *enrichment* and *realization*, that are defined as follows:

- A signature
*sigexp**enriches*a signature*sigexp'*if*sigexp*has at least the components specified in*sigexp'*, with the types of value components being at least as general in*sigexp*as they are in*sigexp'*. - A signature
*sigexp**realizes*a signature*sigexp'*if*sigexp*fulfills at least the type definitions specified in*sigexp'*, but is otherwise identical to*sigexp'*.

In other words *sigexp* enriches *sigexp'* if we can obtain *sigexp'*
from *sigexp* by dropping components and specializing types, and *sigexp*
realizes *sigexp'* if we can obtain *sigexp'* from *sigexp* by
"forgetting" the definitions of some of *sigexp*'s type components.
It is immediate that any signature both enriches and realizes itself, and it is not hard
to see that enrichment and realization are transitive.

We then say that *sigexp* *matches* *sigexp'* if there exists a
signature *sigexp''* such that *sigexp *enriches *sigexp''* and *sigexp''*
realizes *sigexp'*. Put in more operational terms, to determine whether *sigexp*
matches *sigexp'*, we first drop components and specialize types in *sigexp*
to obtain a *view* *sigexp''* of *sigexp* with the same components as
*sigexp'*, then check that the type definitions specified by *sigexp'* are
provided by the view. Signature matching can fail for several reasons:

- The target contains a component not present in the candidate.
- The target contains a value component whose type is not an instance of its type in the candidate.
- The target defines a type component, that is defined differently or not defined in the candidate.

The first two reasons are failures of enrichment; the third is a failure of realization.

Some examples will clarify these definitions. Let us consider realization first
since it is the simpler of the two relations. The signature `INT_ORDERED`

realizes the signature `ORDERED`

because we may obtain the latter from the
former by "forgetting" that the type component `t`

in the signature `INT_ORDERED`

is defined to be `int`

. The converse fails: `ORDERED`

does not
realize `INT_ORDERED`

because `ORDERED`

does not define the type
component `t`

to be `int`

. Here is another counterexample to
realization. The signature

signature LESS_THAN = sig

type t = int

val lt : t * t -> bool

end

does not realize the signature `ORDERED`

, even though it defines `t`

to be `int`

, simply because the `eq`

component is missing from the
signature `LESS_THAN`

.

That's all there is to say about realization. Enrichment is slightly more
complicated. The signature `ORDERED`

enriches the signature `LESS_THAN`

because it provides all of the components required by the latter, at precisely the
required types. For a more interesting example, consider the signature of monoids,

signature MONOID = sig

type t

val unit : t

val mult : t * t -> t

end

and the signature of groups,

signature GROUP = sig

type t

val unit : t

val mult : t * t -> t

val inv : t -> t

end

The signature `GROUP`

enriches the signature `MONOID`

, as
might be expected (since every group is a monoid).

The enrichment relation respects signature equivalence. For example, the
signature `INT_ORDERED`

enriches the following signature:

signature INT_LESS_THAN = sig

val lt : int * int -> bool

end

Here we have dropped both the `t`

and the `eq`

components of the
signature `INT_ORDERED`

, and specified `lt`

to have a superficially
different type than is specified in the signature `INT_ORDERED`

. As was
pointed out earlier, the signature `INT_ORDERED`

is equivalent to the signature
`INT_ORDERED_VARIANT`

, which clearly enriches the signature `INT_LESS_THAN`

.
Since enrichment respects signature equivalence, it follows that `INT_ORDERED`

is an enrichement of `INT_LESS_THAN`

.

The enrichment relation also allows the types of value components to be specialized by instantiating polymorphic types. For example, the signature

sig

type t

val f : 'a -> 'a

end

enriches the signature

sig

type t

val f : t -> t

end

simply because the polymorphic type `'a -> 'a`

may be specialized to the
required type `t -> t`

(by taking `'a`

to be `t`

).

There is one additional case of enrichment to consider. A datatype specification may be regarded as an enrichment of a signature that specifies a type with the same name and arity (but no definition), and zero or more value components corresponding to some (or all) of the value constructors of the datatype. The types of the value components must match exactly the types of the corresponding value constructors; no specialization is allowed in this case. For example, the signature

sig

datatype 'a rbt =

Empty | Red of 'a rbt * 'a * 'a rbt | Black of 'a rbt * 'a * 'a rbt

end

is considered to be an enrichment of the signature

sig

type 'a rbt

val Empty : 'a rbt

val Red : 'a rbt * 'a * 'a rbt

end

which specifies two of the three value constructors of the datatype as ordinary values.

Putting these ideas together, we see that the following signature matches the signature
`MONOID`

:

sig

type t = int list

val unit : 'a list

val mult : 'a list * 'a list -> 'a list

val aux : 'a list

end

Why? First, we drop the component `aux`

, and specialize the type of `mult`

to `int list * int list -> int list`

and the type of `unit`

to ```
int
list
```

by taking `'a`

to be `int`

, thereby obtaining the
intermediate signature

sig

type t = int list

val unit : int list

val mult : int list * int list -> int list

end

This intermediate signature is equivalent to the signature

sig

type t = int list

val unit : t

val mult : t * t -> t

end

By neglecting the definition of the type `t`

we obtain the signature `MONOID`

.
Therefore the signature match succeeds.

The point of having signatures in the language is to express the requirement that a
given structure have a given signature. This is achieved by *signature ascription*,
the attachment of a target signature to a structure binding. There are two forms of
signature ascription, *transparent* and *opaque*, differing only in the
extent to which type definitions are propagated into the scope of the binding.
Transparent ascription is written as

`structure`

strid`:`

sigexp`=`

strexp

Opaque ascription is written as

`structure`

strid`:>`

sigexp`=`

strexp

The two are distinguished by the use of a colon, "`:`

", or the
symbol "`:>`

" before the ascribed signature.

Here is an example of transparent ascription. We may use transparent ascription
on the binding of the structure variable `IntLT`

to express the requirement
that the structure implement an ordered type. This is achieved as follows:

structure IntLT : ORDERED = struct

type t = int

val lt = (op <)

val eq = (op =)

end

Transparent ascription is so-called because the definition of `IntLT.t`

is
not obscured by the ascription; the equation `IntLT.t`

= `int`

remains valid in the scope of this declaration. Transparent ascription is
appropriate here because the signature merely expresses the requirement that the
given structure provide a type and two comparison operations. We do not intend that
these be the *only* operations on that type. (Had we done so the structure
would be useless because there would be no way to create a value of type `IntLT.t`

,
rendering the structure `IntLT`

useless!) The structure `IntLT`

may be thought of as a *view* of the type `int`

as a type ordered by the
standard comparison operations. We may form another view of `int`

as an
ordered type, but with a different ordering, by making the following binding:

structure IntDiv : ORDERED = struct

type t = int

fun lt (m, n) = (n mod m = 0)

val eq = (op =)

end

Here's an example of opaque ascription. We may use opaque ascription to specify
that a structure implement queues, and, at the same time, specify that *only* the
operations in the signature be used to manipulate values of that type. This is
achieved as follows:

structure Queue :> QUEUE = struct

type 'a queue = 'a list * 'a list

val empty = (nil, nil)

fun insert (x, (bs, fs)) = (x::bs, fs)

exception Empty

fun remove (nil, nil) = raise Empty

| remove (bs, f::fs) = (f, (bs, fs))

| remove (bs, nil) = remove (nil, rev bs)

end

Opaque ascription is so-called because the definition of `'a Queue.queue`

is
hidden by the binding; the equivalence of the types `'a Queue.queue`

and ```
'a
list * 'a list
```

is not propagated into the scope of the binding. This is
appropriate because we wish to ensure that queues are created and manipulated only by the
"official" operations in the signature, and not by any other means. By
suppressing the identity of the implementation type we preclude use of any operations on
values of that type other than the ones specified in the signature.

Type checking a structure binding proceeds as follows. First we determine the *principal
signature*of the structure expression on the right-hand side of the binding. (It
is an important property of the language that the principal signature of a structure
always exists; there is always a "most accurate" description of any
structure.) We then proceed according to whether there is an ascribed signature,
and, in case there is, according to whether it is a transparent or opaque ascription.
If there is no ascribed signature, the principal signature of the right-hand side
is assigned as the signature of the structure variable. If there is an ascribed
signature, we match the principal signature against it to determine whether its
requirements are met. If not, the binding is rejected as ill-typed. If so,
then we assign a signature to the structure variable according to whether the ascription
is transparent or opaque. If it is transparent, the structure variable is assigned
the *view *of the candidate signature determined by the matching the candidate signature determined by the matching process; if it is
opaque, the structure variable is assigned the *ascribed* signature. This
means that for a transparent ascription *the definitions in the principal signature of
the types occurring in the ascribed signature are propagated into the scope of the binding*,
whereas for opaque ascription *only the information explicitly appearing in the
ascribed signature is propagated.* In particular if a type is specified in the
ascribed signature, but no definition is provided, then the definition of that type is
hidden from the clients of that binding, rendering it opaque.

It remains to define the principal signature of a structure expression. There are
two forms of structure expression to be considered (at this stage): a structure variable
and a `struct`

expression. A structure variable has as principal
signature the signature assigned to it by the ascription process just described. An `struct`

expression is assigned a principal signature* *by a component-by-component analysis
of its constituent declarations. The rules are essentially as follows:

- Corresponding to a declaration of the form
`type ('a`

, the principal signature contains the specification_{1},...,'a) t =_{n}*typ*`type ('a`

._{1},...,'a) t =_{n}*typ* - Corresponding to a declaration of the form
`datatype ('a`

,_{1},...,'a) t =_{n}*con*of_{1}*typ*| ... |_{1}*con*of_{k}*typ*_{k}the principal signature contains the specification

`datatype ('a`

._{1},...,'a) t =_{n}*con*of_{1}*typ*| ... |_{1}*con*of_{k}*typ*_{k} - Corresponding to a declaration of the form
`exception`

, the principal signature contains the specification*id*of*typ*`exception`

.*id*of*typ* - Corresponding to a declaration of the form
`val`

, the principal signature contains the specification*id*=*exp*`val`

, were*id*:*typ*is the principal type scheme of the expression`typ`

(relative to the preceding context).`exp`

The complete rules are slightly more complicated than this because they must take account of such features as pattern-matching in value bindings, mutually recursive declarations of functions, and the possibility of shadowing bindings by re-declaration. However, the rules given above are a rough-and-ready approximation that will serve for most purposes; the reader is referred to The Definition of Standard ML for a complete account.

With these rules in mind, it is a good exercise to review the two examples of signature ascription given above. Go through the steps of forming the principal signature, then check that the principal signature matches the ascribed signature, and determine the signature to assign to the structure variable in each case.