Introducing Structured Data Types into Internet-scale Information Systems

Introduction and Overview

Managing the increasingly large volume of information on computer networks is rapidly becoming an important problem in computing. The Internet, the largest wide-area computer network, is growing exponentially in terms of hosts, users, and traffic. The NSFNet backbone carried 14 Terabytes of data in March 1994; about half of that was due to information services, such as FTP, Gopher, and WWW. It is clear that a large supply and demand of information exists.

The form in which information is disseminated on the Internet, however, leaves much to be desired. Most information has some sort of semantic structure to it. It could be a text broken up into chapters and paragraphs, a bus schedule showing routes and times, a city map displaying streets and elevations, or a complex medical database. But while Internet information systems may be able to transmit the data involved with these pieces of information, they give little assistance in telling how the data is structured.

The semantic structure of information makes a large body of information much more manageable. Knowing the meaning of a type of information helps one extract, derive, compile, and condense useful information from a larger set of raw data. It helps in searching for relevant information, and in intelligently filtering out irrelevant information to a query. In these tasks, it is not enough to simply know that a piece of information is composed of several components; ideally, one wants to be able to know the meaning of the components, and what one can do with the parts. A search of card catalog entries, for instance, may need to know how to extract the author of an entry, and compare the author's name against a search term.

In the Internet, there is little support for semantically structured information. A particular application, such as a library catalog, may define a certain format for their book entries, which may be semantically rich, but only meaningful to programs specifically written to understand that format. A client program written to read University A's card catalog may be able to make no sense of University B's card catalog, even though both are available on the Internet.

In contrast, applications that want to share their information widely are generally forced to use a lowest common denominator approach. The most common such denominator is plain unstructured text. Frequently used applications may, over a long period of time, settle on higher-level common denominators, such as RFC-822 mail messages, or GIF image formats, or documents formatted with TeX or HTML. But these higher-level formats still lack much of the semantic structure many applications need; and the process of finding a usable common standard even for these formats can take years. (Then, in a few years more, these formats are often replaced by other, incompatible formats.) The rate at which new data types can be introduced and used in an Internet context is far too slow, and cannot be made much faster with current standards procedures.

How can information be provided on the Internet at a higher semantic level, while remaining usable by a large number of information clients? Two observations are relevant here:

I propose to make an explicit object-based level of abstract data usable in Internet information systems. Widespread use of such abstract data requires that new types be definable anywhere on the network, and not simply by some central standards authority. Furthermore, in order for these types to be used, information about these new types, and operations on those types, must be available to other agents which request it. This requires not only support specifically for abstract types, but also a well-defined interface for agents to talk to each other about types and operations; and some standard method to provide information about types, their operations, and their relations.

I claim that these requirements can be satisfied with a two-level software architecture. The upper level focuses on the data being shared, and the abstract operations being carried out on it. At this level, methods are invoked; object references are resolved; new data types and operations are defined. (See figure 1b.) The lower level focuses on the agents supporting these operations. Here, agents request other agents for data objects or references, carry out abstract data operations on behalf of other agents, and encode and decode concrete representations of abstract objects so that they can be passed through the network. (See figure 1a.) This level abstractly describes what is already carried out (in a domain-specific manner) by the protocols of many existing Internet agents, such as HTTP [BL93] servers, Domain Name Service [Moc87] resolvers, or WAIS [Kah91] indexes. (Information from these existing systems can also be incorporated into the higher-level information system through the use of "wrapper" or "gateway" agents, which provide explicit abstract types for the implicit data abstractions these systems support.)



Figure 1. The two levels of abstraction in an information system.
To bridge these two levels of abstraction, the agents need to know about the types of objects they are manipulating. For this purpose, I propose a special mediator agent that can give information about types of information in the network. A client or a server can contact this agent (which I call a type oracle) to find an agent to carry out a defined operation on a data type, or to find out how information of one type or encoding can be converted into another type or encoding. Someone who wishes to define a new data type or encoding can register it (and its operations) with a type oracle, which can then share this information with other agents, including other type oracles. Oracles can also use their knowledge of the lattice of types and encodings to derive new transformations not provided by any single agent (such as a conversion from type A to type C that uses a converter from A to B followed by a converter from B to C).

A few questions arise at this point: Can a coherent information system be built to this design? Will the design really give widely-distributed information systems more semantic power? Will it be useful for real applications, or will it introduce too much overhead (either in response time to queries, or in the amount of work a client or provider is expected to do) to be feasible? Will it be able to interoperate with existing information systems? I propose the following course of action to answer these questions:

Recap: Key Concepts

The key concepts of the thesis, then, are these: Of these, the type oracle should be the primary contribution of the thesis.

In the remainder of this proposal, I describe in more detail the rationale for my research. I will explain my work's relation to the current state of the practice in Internet information gathering, and to distributed computing concepts. I will outline the basic abstractions of my architecture, and explain the problems they address and some of the problems involved in using them. I will describe some relevant related work by others. I will describe my plan of research, tell how long I expect my activities to take, and describe what contributions I expect these activities to make.

Internet Information Systems: Uses and Problems

As noted in the introduction, the Internet is rapidly becoming a widely-used medium for exchanging information. Many applications proposed for networked information systems imply a rich structure to this information. For example, a medical researcher may want to examine blood pressure readings from a clinical sampling and correlate them to heart attack occurrences, using the structure of patient medical histories. A scientist may want to find books in several libraries about plate tectonics, using catalog entries and search indexes. A software engineer may wish to find and examine C++ modules for processing SQL queries, using the structure of program archives and descriptions.

In an ideal world, such tasks would be simple to carry out effectively. But they remain difficult or infeasible in today's Internet, due in part to limitations of the net's current model of information space. Among these limitations:

Computation models: The need for abstract types: A number of the problems above can be solved in part by better computation models for internet information systems, in particular, abstract types. Some benefits of abstract type systems: In the next section, I will look briefly at two communities working towards usable wide-area data types: the distributed computing community and the community of developers of existing Internet information systems. By examining the strengths and weaknesses of their approaches, I will lay the groundwork for an architecture combining features from both communities.

Distributed computing perspectives

The distributed computing community has already proposed or implemented a number of systems for distributed objects. If abstract data types are so useful for distributed information systems, then, why hasn't one of these object systems taken over cyberspace? While immaturity of these systems may be one possible reason, another important reason is that the applications these systems are designed for are different in important ways from information dissemination applications.

Why existing distributed computing models aren't sufficient: Distributed computing researchers have long been aware that computing over multiple machines introduces many new problems not present in a single address space:

How existing Internet infosystems are different

Fortunately, because their application domain is limited, Internet information applications do not have to solve all the problems inherent in distributed computing. In particular, the information delivery task can be simplified by the following domain assumptions: Relaxing currency and consistency requirements gives third-party agents a useful role in an Internet information system. An agent can provide information originally supplied by another agent without necessarily having to verify that the original agent's information has not changed. It can synthesize information based on data from several agents. It can derive or transform the information for a client in ways the original server might not be able or willing to do.

How existing Internet infosystems handle datatypes. Many Internet information systems have found it useful to define their own semantic types. Gopher, for instance, uses menus and bookmarks to let users navigate. The World Wide Web (WWW) [BL+92] uses simple structured hypertext documents to navigate through the system, and defines a data type (HTML) for these documents. While these types are more useful than the simple ASCII text used to encode them, users of these systems soon want more structured types. For example, a number of WWW sites have "What's New" pages in HTML, which invariably consist of a list of dates, resource descriptions, and links to the resources, in reverse date order (and sometimes spread out over several documents). This format convention reflects a new 'abstract type' to the human client. But this type cannot be easily used by programs (though it might be convenient for some of them) because the information system provides no way to describe the new type in a well-defined way. A standards body might incorporate it into a later revision of the information system, but if this occurs at all, it will take a very long time.

An example. Even with a relatively small, simple set of types, agents may have difficulty exchanging information, as shown in the following example. Suppose that a client program on a Macintosh has a reference to an image it wishes to display. Retrieval of the image is simple enough in many Internet infosystems: The client examines the reference to see what server it should contact, talks to the server with the appropriate protocol, and gets the image shipped to it for display. The World Wide Web, Gopher, and even anonymous FTP are all capable of doing this.

But can the client do anything with the information it retrieves? Suppose that the image is stored on a Unix-based server at a remote university. The image is saved there in X bitmap format (xbm), and has been compressed with GNU-zip to make it easier to store, and quicker to ship. This format and encoding makes sense for the Unix environment where the picture is stored, but may not be useful to the client. The Mac client, for instance, may know how to display GIF images, for instance, but not know anything about XBM images (a similar type, but with a different color model and encoding). And the GNU uncompressor may not be available on the Macintosh.

The conflict in data types must be resolved if the two agents are going to interact meaningfully. First of all, at least one agent must realize the nature of the conflict. (A naive client program might blithely assume everything is going well, and display the unknown-format image as gibberish-- or worse, crash when it tries to display the image.) If the client can tell what kind of information the server is sending it, it can detect a problem, and possibly convert the data to a form it can operate on. Or, the client may tell the server up front what data formats it can deal with, and the server can convert the data appropriately.

Existing systems have these capabilities, but only to a limited extent. When Gopher and WWW servers ship data, they also send meta-data identifying the type of the data they ship. The World Wide Web's HTTP servers also allow a client to send a list of types it will accept. The vocabulary of types one can talk about is limited; in Gopher's case, to a set of single-character codes set by the Gopher developers; and in the Web's case, to the MIME type set. MIME's type repertoire (described in [BF92]) is richer than Gopher's, and allows people to use their own 'experimental' type names outside the standard type repertoire, but all parties in a transaction must have a common understanding of the experimental types used. Also, MIME's encoding repertoire is small and fixed, so that 'GIF' and 'compressed GIF' need to be expressed as two different types in the MIME system. (Web developers stretched the MIME convention to add new encoding types, so as to avoid the combinatorial type-expansion problem arising with different data types having different compressions. But the problem resurfaces with two or more levels of encoding, which is not uncommon.)

Why third party agents are useful. But there is a more fundamental problem to these systems than limited vocabularies for types and encodings. Even if the client knows the kind of data it gets, and the server knows what kind of data the client wants, one of the agents has to know how to adapt to the other. In the image-fetching example, one of the parties has to know how to convert the data from the server's format to the client's format. If they don't have enough knowledge among themselves to do the conversion, the agents are stuck-- even though an agent somewhere else in the network may be able to supply this missing knowledge, or do the required conversion.

Third parties can be useful not only for type conversion, but also for abstract operations on types. For instance, if for some reason the client wanted the image not for display, but simply to get some information from it (such as its dimensions, or a string corresponding to text characters embedded in the image), a mediator could be enlisted to carry out the operation and return the results to the client. Conversion might not be necessary at all.

Basis of a more powerful architecture

To some up, then, there are two basic requirements for an interet-wide system to handle structured types, that are not adequately addressed in existing information systems. These are:
1.
Ways to define and describe inter-agent operation at a higher level than simple client-server interaction with a fixed protocol. One should be able to publicize the servives an agent provides for data and operations, and the agents should be able to negotiate with clients to carry out these services. There needs to be a way of discovering particular agents for a needed task.

2.
Ways of talking about the data types, encodings, and associated operations that these agents handle. In a large, distributed internet, new types, encodings and operations will be introduced all the time. But this is not an unfamiliar problem. The universe of information objects on the Internet is large enough that the futility of central administration of the objects is obvious. Instead, infosystems designers have come up with decentralized ways to distribute and refer to the objects. The solutions (such as the Web's URL scheme, which specifies the access method of a particular Net object) are not perfect, but do currently provide a workable way to find objects in many cases. Likewise, with rich enough conventions for talking about types, new abstract types and operations can be brought into the Internet and used, without having to wait for some centralized standards body to act.
Statement of thesis. These two architectural concepts are related very closely. A well-defined system for talking about types and operations provides a rich framework in which agents can interact. And an agent that is an expert in cataloging and handing out type information allows new operations (and types) to be defined in a distributed manner. This agent, the type oracle, has a protocol that allows agents to request services and discover information about new abstract data types. (It can, for instance, identify new types in relation to known types, and can find other agents to carry out needed operations or conversions on unknown types.)

My thesis, then, is this:

A model of distributed information systems allowing individual information providers to define and share their own abstract structured types is feasible, and will make Internet information systems significantly more powerful. A distributed network of type oracles, combined with a flexible naming, subtyping, encoding, and inheritance scheme, will allow these structured types to be introduced and used by a variety of information agents.
In the sections to follow, I will describe how I plan to investigate and test this statement. First, I describe some of the details of a design which supports this model, to demonstrate how such a system could be designed and built. Then, I will describe how the system relates to other work in similar fields. Finally, I will discuss the specific activities I will undertake to complete the thesis, and the contributions that I expect to result.

The design of an Internet information object system

In this section, I will discuss the key abstractions in a design for such an Internet information object system, explain how and why they would be used, and discuss how they should be implemented in a workable system.

The major abstractions discussed here are objects, agents (and their computation model), and type oracles. References and meta-data will also be addressed. Since one goal of my system is to interoperate (at least to some extent) with these systems, I will also discuss, where appropriate, how some of these abstractions relate to existing information systems.

Objects: Abstract types, encodings, and operations

What objects are. The system I propose represents information in the Internet as objects, which are instances of abstract types. Each type is identified by one or more well-defined names. Objects are used through operations or methods, whose names and signatures are available in the type declaration, as with many object-oriented languages. Objects may also have a set of attributes, which may be retrieved, or sometimes set, via an operation. Types may also have expected semantics; for instance, one type's "angle" attribute may be expected to always be a number between 0 and 360. Objects are used by invoking operations or reading attributes in the manner of a procedure-call (or remote-procedure call).

A type may have one or more supertypes. Objects of one type support the operations of the type's supertypes, and can 'stand in' for the objects of the supertype if necessary. Inheritance is not implied by subtyping, though, for reasons to be shown later.

Objects in a heterogeneous wide-area infosystem. So far the object model should look quite similar to traditional programming conventions. There are two important additional aspects of the model, though. One is that objects can have meta-data associated with them, showing their origin, type, or other run-time information. This sort of information is usually handled transparently by the environment in traditional programming languages, but is made more explicit here, for reasons we will see later.

Another important aspect of objects in this system is that they may have encodings. Encodings are used to transmit object instances from one agent to another. They may also be used in the implementation of an object operation. Encodings are similar to the representations of object-oriented programming languages, but they are not opaque: agents with a copy of the object can work directly with its encoding, if they know how.

An encoding specification includes a lower-level type used to represent the object, and a named scheme used for translating to and from this type. For example, a HTML document may be encoded as a sequence of characters, using its standard SGML representation as its encoding scheme. An encoding itself may be encoded, since it too is an instance of an abstract type. A given object type may have several encodings associated with it, and subtype encodings need not have anything to do with supertype encodings. (This is one reason why subtyping does not imply inheritance.) An object will eventually be encoded in a 'primitive' type, which could be as simple as a sequence of bytes. (At some level, all Internet information gets transmitted in this form; though agents might treat higher-level types as primitives as well.)

Objects in existing infosystems. If one considers a byte stream to be a simple object (with operations like 'next-byte'), all Internet information systems can be modeled with objects, but this model is degenerate and uninteresting. But the formats of the data types used in information systems can be treated as encodings of abstract types. They can thus be incorporated into an object-based information system via an agent that provides an object wrapper around the encodings. (Rufus [Sho+93] essentially does this for its "semi-structured files".) An HTML document in the World Wide Web, for instance, could be viewed as an encoding of a "Web-hypertext" object, with methods like "follow the first link" or "fetch the title".

Information agents

Information agents are programs that operate on information objects. They can talk to other agents in the Internet, operating in a client or a server role (and sometimes in both roles).

In my design, agents know of a certain set of types, as well as a set of definitions of operations on these types. These definitions might include one or more of the following:

A particular agent might only implement an operation for certain encodings of a type. The same code may be used by several implementations. This allows for a certain degree of code inheritance, if desired.

Agents have a repository of objects they have direct access to, without having to talk to other agents. The 'same' object may be in several agent repositories at once, since when a server 'transmits an object' to a client, it actually transmits a copy of its encoding. If clients are particularly concerned about consistency, meta-data can be used to identify the agent originating an object.

Interoperation with other agents: the computation model. In my design, agents speak a common protocol about objects, types, and operations. To carry out an operation on an object, an agent may make requests to one or more servers that have knowledge of the object, its type, or its operations. Agents may also know special-purpose protocols to talk to databases and clients that don't talk the common protocol directly, such as HTTP servers or SQL databases. There may be multiple protocols used to carry out similar operations, depending on the performance requirements of an application, but my thesis will concentrate on a single protocol that's robust enough to be usable in case studies in the later part of my thesis. In any case, changes in the required protocol should be much less frequent than changes in the set of data types.

In my basic protocol, agents use a request-reply interaction similar to that of remote procedure calls. (While a simple agent might actually implement the request and reply as a procedure call, nothing prevents an agent from having requests pending on multiple agents at once, if that is desires for efficiency.) Client agents can make multiple requests in the same session, but state (other than that inherent in the information repository) need not be preserved between different sessions, and should be kept to a minimum within a session. This has helped keep the interactions of existing infosystems simple. Mutation of the repository is not a part of the protocol, so concerns like serializability are not an issue.

The type oracle and its services

What type oracles do. A type oracle is a mediator agent that provides information about structured types to information agents, and to application programmers. Given the name of a type, a type oracle can find its description, its supertypes, operations, methods, and encodings. It can refer clients to agents that can carry out requested operations or conversions between types, or between encodings. It can take advantage of its knowledge of the type lattice to perform conversions and substitutions that are not explicitly coded by any single agent. (For example, if a client has type encoding A, and needs to convert it to B, it can find a converter from A to C, and another converter from C to B. See figure 2c below.) Earlier research (such as the data translation work of [Mam+89]) has revealed algorithms for some of these tasks, but there are still a number of open algorithm questions that can be studied in the thesis.

Why type oracles? Type oracles simplify the problem of managing large numbers of abstract types. Their ability to locate third-party expert agents for a type allows information clients to use many more types (and operations) than those they were explicitly coded for. They also avoid the requirement in many distributed systems that there be a single agreed-on form for all types (figure 2a below), without requiring explicit conversions from every type to every other (figure 2b below). Type oracles can use meta-data associated with conversion operations to direct type conversions or operations that preserves as much information as is necessary and feasible. Some types of conversion require no information to be lost; others require that certain operations or expectations are possible, even if this means the loss of extraneous information.



Figure 2. Different models of type conversion


Multiple type oracles. A full-blown Internet information system will have multiple type oracles. Oracles can query other oracles to find out about new types. (Conversion and substitution strategies will work best if a given oracle knows about as many types and mappings as possible.) A given type can be kept private (as one might wish to do while developing and testing it) by registering it with a local oracle, but instructing the oracle not to give information about it to outside oracles or agents.

How types can be used with meta-data and references

Meta-data and references are both essential parts of a wide-area information system. In a system where structured information is passed between agents that may not know each other, data may need to be accompanied by tags identifying the type of data. Type tags, however, are not the only kind of meta-data which may be needed: information about the source, currency, and cost of information, for instance, may be desired as well. References are required whenever a piece of information wishes to name or point to another piece of information. They may also be required for efficiency, when it is not practical to ship a large block of data from one site to another.

Meta-data and references as abstract types. There is, however, no universally adopted mechanism to name objects in information systems. (Finding adequate naming schemes in heterogeneous distributed systems is in fact an open problem, one that this thesis will not attempt to solve in the general case.) There are, however, a number of naming schemes with varying semantics (such as the URLs of the Web, the semantic filenames of Prospero [Neu92], the domain naming scheme of the Internet, and the Message-IDs of Usenet). The abstract types model I propose can be used to distinguish and classify the various naming schemes in use on the Internet. Similarly, as new forms of meta-data become necessary, new abstract types can be used to model them. Thus, a wide variety of data, from both existing infosystems and new infosystems, can be used in this framework.

Minimal meta-data and reference requirements. While I do not intend to investigate all of the possible types of meta-data and references in my thesis, I will have to design a few required for the system to operate. For instance, basic forms of reference to other agents must be supported. Meta-data containing type tags will be needed to effectively use type oracles. And the names of types themselves are references that require certain semantics (particularly persistence, unique identification, and resolvability) and namespace management. My thesis will include provisions for these basic types in the protocol or in the basic type lattice.

Image example revisited

How will these abstractions work together in actual use? We return to the image example from the previous section for an illustration. The client starts out with a reference for an image it wishes to display. It resolves the reference (perhaps with the help of another agent), and contacts a server that holds the image in its repository. The server passes the client meta-data indicating that the picture is of type X-Window-Dump (a subtype of Image), and encoded in the standard XWD format, further encoded with GNU compression. Since the client does not know to implement the Display operation for this format, it asks a type oracle for help in displaying it as an Image. The Display operation cannot be executed remotely, so conversion is required. The oracle tells the client that the image can also be converted to other subtypes, one of which is the GIF type the client understands. The client can display GIFs, so it uses the type oracle to find agents that will do a conversion out of GNU compression, and then from XWD to GIF. An uncompressed GIF-format image is finally sent to the client, which it then displays.

The example above elides a number of details that need to be tuned carefully in an actual implementation. The strategy for negotiation between agents is left unspecified, as is the strategy for when to send data, and to whom. (Bandwidth may be saved, for instance, if the initial client request to the server returns meta-data but not the actual image data, assuming the image is large.) While I suspect that different strategies may be appropriate for different applications, I hope to discover useful general strategies for agent interaction in my thesis.

Having completed an overview of my design, I now discuss the relationship of my work to work in related areas.

Related work

Research projects in a number of areas have direct relevance to my thesis. A detailed analysis of this work belongs in the thesis proper rather than in the proposal, but the following categories of related work (some already mentioned) are worth noting:

The Plan of the Thesis

The following questions are key to my thesis: The following questions are ones I wish to address in my thesis, but they may not necessarily be completely answered:

In order to answer these and other questions, I will do the following:

A successful design and implementation of the system, combined with tests of the system on sample applications, and an observance of the use of the system (or related systems) in the Internet, should provide the necessary material to answer the key questions identified above. The additional questions can be answered to a certain extent as I consider different design possibilities and observe the use of the system by myself and others. The thesis will include a report of the alternatives I considered and implemented, and the strengths and weaknesses I observed in practice.

Timeline. I would like to work on all three activities simultaneously in the course of my thesis, but the relative emphasis will change over time. The infosystems analysis and architecture specification will predominate in the first phase; the implementation in the next phase; and the testing in the third phase. Writing will mostly take place during the first and last phases of the work.

I will be taking an incremental approach to the implementation. I will be experimenting with different protocols on my own early in the thesis work, but eventually want to have outside users try out the system as well to see how well the system scales up. Partly in order to attract outside users, I will need to make my system interoperable (in both directions) with existing infosystems like WWW. In this way, users can take advantage of the added value of my system without losing access to the information resources they already have. In addition, WWW browsers like TkWWW and Mosaic with their fill-out forms capabilities are general enough that I can probably use them, or slight variants on them, as the initial user interfaces to my agents, instead of having to build my own.

My thesis has a number of milestones that can serve as good indicators of progress. Here are the major ones, with estimates of probable time to completion:

I expect to complete the thesis in the fall of 1995, if all goes smoothly.

Expected Contributions

These are the key contributions that I expect to come from my thesis: There are certain questions relevant to my information system that I do not expect to make major contributions towards, though I hope to use the research and experience of others in these areas in my design. These include questions of security, privacy, and cost accounting, naming syntax and semantics, and human interfaces to information systems. Other problems, such as search and filtering, will probably not be addressed directly in the thesis, but I hope that the work of the thesis will enable better solutions to these problems.

While the world of Internet information systems is changing extremely rapidly, I expect these contributions to have staying power. A well-constructed design, analysis and experience report on type oracles and their datatypes should remain useful as a guide to designers of many distributed information systems beyond the particular system I design.

Notes

[AAP86]
Association of American Publishers. Standard for Electronic Manuscript Preparation and Markup. Washington, D.C.: Association of American Publishers, 1986.
[AG94]
Robert Allen and David Garlan. "Formal Connectors". Technical Report, CMU-CS-94-115, Carnegie Mellon University, Pittsburgh, PA. A copy is available on-line.
[Ank+93]
F. Anklesaria, M. McCahill, P. Lindner, D. Johnson, D. Torrey, and B. Alberti. "The Internet Gopher Protocol." Internet RFC 1436, March 1993.
[Abo+93]
Gregory Abowd, Robert Allen, and David Garlan. "Using Style to Understand Descriptions of Software Architecture". In Proceedings of the ACM SIGSOFT '93 Symposium on the Foundations of Software Engineering, December 1993, p. 9-20.
[BC90]
Kenneth P. Birman and Robert Cooper, "The ISIS Project: Real Experience with a Fault Tolerant Programming Systems". Technical Report TR 90-1138, Cornell University Department of Computer Science, July 1990.
[BF92]
N. Borenstein and N. Freed. "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies." Internet RFC 1341, June 1992.
[BL+92]
T.J. Berners-Lee, R. Cailliau, J-F Groff, B. Pollermann. "World-Wide Web: The Information Universe". In "Electronic Networking: Research, Applications and Policy", Vol. 2 No 1, pp. 52-58 Spring 1992, Meckler Publishing, Westport, CT, USA. (A preprint is available on-line.)
[BL93]
T,J. Berners-Lee, "Hypertext Transfer Protocol". Internet draft, CERN, November 1993. Work in progress.
[Bir92]
Kenneth P. Birman, "The Process Group Approach to Reliable Distributed Computing". Technical Report TR 91-1216, Cornell University Department of Computer Science, July 1991, revised September 1992.
[Dan+92]
Peter B. Danzig, Shih-Hao Li, and Katia Obrazacka, "Distributed Indexing of Autonomous Internet Services". Computing Systems, 5(4):433-459, Fall 1992. (A preprint is available on-line.)
[Kah91]
Brewster Kahle. "An Information System for Corporate Users: Wide Area Information Servers". Technical Report TMC-199, Thinking Machines Corporation, Cambridge, MA, 1991.
[Mam+89]
Sandra A. Mamrak, Michael J. Kaelbling, C. K. Nicholas, and M. Share. "Chameleon: A System for Solving the Data-Translation Problem." IEEE Transactions on Software Engineering 15(9): 1090-1108, September 1989.
[Moc87]
P. Mockapetris, "Domain Names - Concepts and Facilities." Internet RFC 1034, November 1987.
[Neu92]
B. C. Neuman, "The Virtual System Model: A Scalable Approach to Organizing Large Systems". Technical Report 92-06-04, University of Washington Computer Science Department, Seattle, WA, June 1992. A copy is available on-line.
[OMG92]
Object Management Group. The Common Object Request Broker: Architecture and Specification. OMG Document Number 91.12.1, Revision 1.1. Wellesley, MA: QED Publishing Group, 1992.
[Sho+93]
K. Shoens, A. Luniewski, P. Schwarz, and J. Thomas. "The Rufus System: Information Organization for Semi-Structured Data". In Proceedings of the 19th VLDB Conference, Dublin, Ireland, 1993.
[Sch+92]
Michael F. Schwartz, Alan Emtage, Brewster Kahle, and B. Clifford Newman. "A Comparison of Internet Resource Discovery Approaches." Computing Systems 5(4):461-493, Fall 1992. A preprint is available on-line.
[SS94]
Peter Schwarz and Kurt Shoens. "Managing Change in the Rufus System". In Proceedings of the 1994 International Conference on Data Engineering, Houston, Texas, February 1994.
[Wie92]
Gio Wiederhold. "Mediators in the Architecture of Future Information Systems". IEEE Computer 25(3):38-49, March 1992.

spok@cs.cmu.edu (Written 17-May-94)