StreamIO functor
The optional StreamIO functor provides a way to build a stream IO stack on top of an arbitrary primitive I/O implementation. For example, given an implementation of readers and writers for pairs of integers, one can define streams of pairs of integers.
functor StreamIO ( ... ) : STREAM_IO
structure PrimIO : PRIM_IO
structure Vector : MONO_VECTOR
structure Array : MONO_ARRAY
sharing type PrimIO.elem = Vector.elem = Array.elem
sharing type PrimIO.vector = Vector.vector = Array.vector
sharing type PrimIO.array = Array.array
val someElem : PrimIO.elem
structure PrimIO
structure Vector
structure Array
sharing type PrimIO.elem
sharing type PrimIO.vector
sharing type PrimIO.array
someElem
The Vector and Array structures provide vector and array operations for manipulating the vectors and arrays used in PrimIO and StreamIO. The element someElem is used to initialize buffer arrays; any element will do.
The types instream and outstream in the result of the StreamIO functor must be abstract.
If flushOut finds that it can do only a partial write (i.e., writeVec or a similar function returns a ``number of elements written'' less than its sz argument), then flushOut must adjust its buffer for the items written and then try again. If the first or any successive write attempt returns zero elements written (or raises an exception) then flushOut raises the IO.Io exception.
If an exception occurs during any stream I/O operation, then the module must, of course, leave itself in a consistent state, without losing or duplicating data.
In some ML systems, a user interrupt aborts execution and returns control to a top-level prompt, without raising any exception that the current execution can handle. It may be the case that some information must be lost or duplicated. Data (input or output) must never be duplicated, but may be lost. This can be accomplished without stream I/O doing any explicit masking of interrupts or locking. On output, the internal state (saying how much has been written should be updated before doing the write operation; on input, the read should be done before updating the count of valid characters in the buffer.
Implementation note:
Here are some suggestions for efficient performance:
- Operations on the underlying readers and writers (
readVec, etc.) are expected to be expensive (involving a system call, with context switch).- Small input operations can be done from a buffer; the
readVecorreadVecNBoperation of the underlying reader can replenish the buffer when necessary.- Each reader may provide only a subset of
readVec,readVecNB,block,canInput, etc. An augmented reader that provides more operations can be constructed usingPrimIO.augmentIn, but it may be more efficient to use the functions directly provided by the reader, instead of relying on the constructed ones. The same applies to augmented writers.- Keep the position of the beginning of the buffer on a multiple-of-
chunkSizeboundary, and do read or write operations with a multiple-of-chunkSizenumber of elements.- For very large
inputAllorinputNoperations, it is (somewhat) inefficient to read onechunkSizeat a time and then concatenate all the results together. Instead, it is good to try to do the read all in one large system call; that is,readBlock(n). However, in a typical implementation ofreadVec, this requires pre-allocating a vector of size n. However, ininputAll(), the size of the vector is not known a priori and if the argument toinputNis large, the allocation of a much-too-large buffer is wasteful. Therefore, for large input operations, query the size of the reader usingendPos, subtract the current position, and try to read that much. But one should also keep things rounded to the nearestchunkSize.- The use of
endPosto try to do (large) read operations of just the right size will be inaccurate on translated readers. But this inaccuracy can be tolerated: if the translation is anything close to 1-1,endPoswill still provide a very good hint about the order-of-magnitude size of the file.- Similar suggestions apply to very large output operations. Small outputs go through a buffer; the buffer is written with
writeArr. Very large outputs can be written directly from the argument string usingwriteVec.- A lazy functional instream can (should) be implemented as a sequence of immutable (vector) buffers, each with a mutable ref to the next ``thing,'' which is either another buffer, the underlying reader, or an indication that the stream has been truncated.
- The
inputfunction should return the largest sequence that is most convenient. Usually this means ``the remaining contents of the current buffer.''- To support non-blocking input, use
readVecNBif it exists, otherwise docanInputfollowed (if appropriate) byreadVec.- To support blocking input, use
readVecif it exists, otherwise doreadVecNBfollowed (if it would block) byblock. and then anotherreadVecNB.- To support lazy functional streams,
readArrandreadArrNBare not useful. If necessary,readVecshould be synthesized fromreadArrandreadVecNBfromreadArrNB.writeArrshould, if necessary, be synthesized fromwriteVecand vice versa. Similarly forwriteArrNBandwriteVecNB.
STREAM_IO
Last Modified May 10, 1996
Comments to John Reppy.
Copyright © 1997 Bell Labs, Lucent Technologies