Sample Code for this Chapter

In the preceding chapter we considered the following signature of dictionaries with an arbitrary key type:

signature DICT = sig
  type key
  val lt : key * key -> bool
  val eq : key * key -> bool
  type 'a dict
  val empty : 'a dict
  val insert : 'a dict * key * 'a -> 'a dict
  val lookup : 'a dict * key -> 'a
end

The signatures of dictionaries with particular choices of key type were defined using the "where type" construct.  For example, the signature declarations

signature STRING_DICT = DICT where type key=string
signature INT_DICT = DICT where type key=int

define the signatures of dictionaries with string and integer keys, respectively.   The motivation for introducing these specialized instances of the DICT signature is that we typically wish to hold the implementation type, 'a dict, of dictionaries abstract, but leave the type of keys concrete, as described earlier.

The signature DICT is a bit unsatisfactory because it mixes two different notions in one interface, namely the type, key, of keys and its associated comparison operations, lt and eq, and the type 'a dict of dictionaries and its associated operations empty, insert, and lookup.  It would be cleaner to separate these two aspects of the interface, especially since we shall soon consider the key component to be "generic", with the rest being "specific", to the abstraction.   The way to do this in ML is with a substructure, as follows:

signature DICT = sig
  structure Key : ORDERED
  type 'a dict
  val empty : 'a dict
  val insert : 'a dict * Key.t * 'a -> 'a dict
  val lookup : 'a dict * Key.t -> 'a
end

The type of keys and the operation on it are segregated into a substructure of the dictionary structure, a component of a structure that is itself a structure.   Correspondingly, uses of the type key are replaced by references to the t component of the substructure Key.  This leads to a hierarchical organization in which we consider the key structure to be subservient to the dictionary operations.

Specialized versions of the signature DICT are build essentially as before, except that we use a long identifier to specify the type of keys:

signature STRING_DICT = DICT where type Key.t=string
signature INT_DICT = DICT where type Key.t=int

Specific implementations of these specialized instances may be defined as follows:

structure StringDict :> STRING_DICT = struct
  structure Key : ORDERED = StringLT
  type 'a dict = Key.t BinTree.tree
  val empty = BinTree.empty
  val insert = ...insert into a BST using Key.lt and Key.eq...
  val lookup = ...lookup in a BST using Key.lt and Key.eq...
end

structure IntDict :> INT_DICT = struct
  structure Key : ORDERED = IntLT
  type 'a dict = Key.t BinTree.tree
  val empty = BinTree.empty
  val insert = ...insert into a BST using Key.lt and Key.eq...
  val lookup = ...lookup in a BST using Key.lt and Key.eq...
end

The difficulty, of course, is that we are repeating the code for dictionaries in each implementation; the elided parts of both structures would be identical.  The only difference between the two dictionary structures lies in the implementation of keys; in one case we choose string operations and in the other we choose integer operations.   Since the bulk of the code is the same, it is a pity to have to repeat it for each choice of key type.

Fortunately, ML provides a convenient means of avoiding such redundancy, called a functor.   A functor is a parameterized module, or a generic structure, that is defined in terms of zero or more argument structures with a specified signature.  A functor may be applied, or instaniated, with any structures matching the argument signatures.  A functor is therefore a kind of function taking zero or more structures as arguments and yielding a structure as result.

In the case of dictionaries we may define a generic implementation that is parameterized by the type of keys and associated comparison operations.  This is achieved by introducing a functor.

functor Dict (structure K : ORDERED) :> DICT where type Key.t=K.t =
struct
   structure Key : ORDERED = K
   type 'a dict = Key.t BinTree.tree
   val empty = BinTree.empty
   val insert = ...insert into a BST using Key.lt and Key.eq...
   val lookup = ...lookup in a BST using Key.lt and Key.eq...
end

This declaration introduces a functor named Dict that takes as argument any structure implementing the signature ORDERED, and yields a structure implementing the instance of the signature DICT determined by taking the key type of the dictionary to be the type component of its argument, leaving the type of dictionaries abstract.  The type checker ensures that the body of the functor matches the specified result signature, under the assumption that the argument has the stated signature.  In the case of the Dict functor the type checker ensures that the principal signature of the body of the functor (the part between struct and end) matches the signature

DICT where type Key.t=K.t,

assuming that the structure K has signature ORDERED.

The Dict functor encapsulates the implementation of dictionaries as a generic structure that is independent of the specific choice of keys.  One advantage of this encapsulation is that should we wish to modify the implementation of dictionaries, say to fix an error or to improve performance, we need only modify the Dict functor, rather than change every occurrence of the dictionary code spread throughout a large system.  This is obviously advantageous for both the original author of the code, and anyone who must maintain it in the future.  In fact common data structures such as dictionaries are typically provided as part of a "shrink wrapped" library, and hence are shared among many different programs, thereby increasing code reuse and reducing redundancy.

The Dict functor provides a generic implementation of dictionaries.   Dictionaries with specific key types may be built by instantiating the Dict functor as follows:

structure IntDict = Dict (structure K = IntLT)
structure StringDict = Dict (structure K = StringLT)

Notice that functor application uses keyword parameter passing --- the parameter is explicitly bound to a structure using a structure binding.  In practice the right-hand sides of such bindings are always (long) identifiers; if not, the compiler implicitly inserts bindings to ensure that this is the case.  In our discussions we will tacitly assume that the right-hand side of all such bindings are (long) identifiers.

What are the signatures of the structure variables IntDict and StringDict?   Since no signature is ascribed to these bindings, the principal signature of the corresponding right-hand side of the binding is assigned to each variable, in keeping with our previous policies.  Since the right-hand side in these examples is a functor application, we must answer the question: what is the principal signature of a functor application?  If --- as here --- the result signature of the functor is opaque, the principal signature is precisely the asribed signature of the functor, but with the structure parameter replaced by its binding (which must be, by our assumption, another structure identifier).  Thus the signature assigned to IntDict is

DICT where type Key.t=IntLT.t

which is equivalent to the signature

DICT where type Key.t=int

since IntLt.t = int.  Similarly, the signature assigned to StringDict is

DICT where type Key.t=StringLT.t

which is equivalent to the signature

DICT where type Key.t=string

What if the functor has no result signature, or its result signature is transparently ascribed?  In that case we assign the intermediate signature of the match as the result signature of the functor, and use that signature as the implied result signature of the functor.

Dictionaries illustrate the use of the ML module system to build generic implementations of abstract types.  A generic implementation of priority queues (which support a remove_min operation that dequeues the "least" element of the queue relative to a specified ordering) may be built in an exactly analogous manner.  Here's a suitable signature of priority queues:

signature PRIO_QUEUE = sig
  structure Elt : ORDERED
  type prio_queue
  exception Empty
  val empty : prio_queue
  val insert : Elt.t * prio_queue -> prio_queue
  val remove : prio_queue -> Elt.t * prio_queue
end

Notice that prio_queue is a type, and not a type constructor, as it was in the case of "plain" queues.  This is a reflection of the fact that the operations on a priority queue are not independent of the type of elements (as they are with plain queues), but rely on the comparison operations that are provided with the Elt structure.

A generic implementation of priority queues is a functor taking as argument a structure containing the element type together with its associated operations:

functor PrioQueue
  (structure E : ORDERED) :> PRIO_QUEUE where type Elt.t=E.t =
struct
  structure Elt : ORDERED = E
  type prio_queue = ...a heap based on the ordering Elt.lt...
  exception Empty
  val empty = ...the empty heap...
  val insert = ...sift a new element into the heap...
  val remove = ...remove the least element and adjust the heap...
end

Specific instances of priority queues may be built as follows:

structure IntPQ = PrioQueue (structure E = IntLT)
structure StringPQ = PrioQueue (structure E = StringLT)

with signatures

PRIO_QUEUE where type Elt.t=int

and

PRIO_QUEUE where type Elt.t=string

respectively.

The situation becomes more interesting when we wish to combine two or more abstract types to form a third.  Suppose we are to implement a (hypothetical) abstract type that employs an ordered type of values that occur both as keys of a dictionary and elements of a priority queue.  The signature of this abstract type might look like this

signature ADT = sig
  structure Val : ORDERED
  type adt
  ...operations...
end

The implementation should be generic in the type of values, and also in the implementation of dictionaries and priority queues; we don't want to build the implementation of these auxiliary data structures into the implementation of ADT's.   There are two approaches to building an Adt functor, each with its advantages and disadvantages.  Here's the first approach:

functor Adt
  (structure V : ORDERED) :> ADT where type Val.t=V.t =
struct
  structure Val : ORDERED = V
  structure D = Dict (structure K = V)
  structure Q = PrioQueue (structure E = V)
  type adt = ...
  ...
end

The functor Adt instantiates the Dict and PrioQueue functors to the structure of values specified as argument to the Adt functor.   This ensures that the type equation

D.Key.t = Q.Elt.t = V.t

holds inside the body of the functor, so that expressions such as

D.insert (Q.remove_min ..., ...)

are well-typed.  (The structures D and Q are not visible outside of the functor since they do not appear in the result signature; they are local auxiliaries used within the functor.)

This approach works well, but if the Dict or PrioQueue functors are changed, the Adt functor must be recompiled to pick up the new versions.  An alternative, which avoids this dependency of the implementation of Adt on the implementations of the Dict and PrioQueue functors, is to treat the dictionary and priority queue structures as additional parameters to the Adt functor.  This leads to the following setup:

functor Adt'
  (structure V : ORDERED and D : DICT and Q : PRIO_QUEUE) :>
    ADT where type Val.t=V.t =
struct
  structure Val = V
  type adt = ...implementation type...
  ...implementation of operations...
end

To build an instance of the Adt' functor we must first built appropriate instances of the Dict and PrioQueue functors and pass these to Adt':

structure IntDict = Dict (structure K=IntLT)
structure IntPQ = Dict (structure K=IntLT)
structure A = Adt' (structure V=IntLt and D=IntDict and Q=IntPQ)

There is a problem, however, with this setup: the functor Adt' is ill-typed!  It is no longer true within the body of Adt that the type equation

D.Key.t = Q.Elt.t = V.t

holds in the body of Adt', even though the equation

IntDict.Key.t = IntPQ.Elt.t = IntLT.t = int

does hold of the arguments, for we might well choose arguments for which the required equation is invalid.  In short, the functor  is "too generic", and consequently the body is not type correct.

What to do?  The solution is to restrict the parameters to the Adt' functor so that the only possible instances are those that satisfy the required equation.   There are two methods for doing this, both equivalent.  The first is to explicitly require that the dictionary and priority queue arguments agree on the value type passed as parameter:

functor Adt'
  (structure V : ORDERED
   and D : DICT where type Key.t=V.t
   and Q : PRIO_QUEUE where type Elt.t=V.t) :>
    ADT where type Val.t=V.t =
struct
  ...
end

The body of Adt' is now type correct since the required type equations hold as a result of our additional assumptions on the arguments.

An alternative is to impose the equational requirement on types in a post hoc manner using a sharing specification:

functor Adt'
  (structure V : ORDERED and D : DICT and Q : PRIO_QUEUE
   sharing type D.Key.t = Q.Elt.t = V.t) :>
    ADT where type Val.t=V.t =
struct
  ...
end

The sharing specification stipulates that the given equation must hold of any instance of this functor.  Any attempt to instantiate Adt' with structures V, D, and Q not satisfying the sharing specification is rejected as ill-formed.

An advantage of sharing specifications is that they provide a direct, symmetric specification of the required type equation without forcing the programmer to explicitly "thread" the common type through the various signatures.  In fact sharing specifications encourage concision since they do not require that the common component be "factored out" as it is in the foregoing example.  Here is a more concise formulation of the Adt' functor in which we drop the first argument entirely, relying only on a sharing specification to constraint the dictionary and priority queue structures appropriately.

functor Adt'
  (structure D : DICT and Q : PRIO_QUEUE
   sharing type D.Key.t = Q.Elt.t) :>
    ADT where type Val.t=D.Key.t =
struct
  ...
end

Notice that the result signature changes slightly to extract the common type from one of the parameters, the choice of which being arbitrary in the presence of the sharing specification.

Sample Code for this Chapter