Date: Tue, 05 Nov 1996 21:57:36 GMT Server: NCSA/1.5 Content-type: text/html Last-modified: Fri, 05 May 1995 15:13:10 GMT Content-length: 11933
Tim Berners-Lee (now with W3O) was asked to design a system that would allow physicists in different parts of the world to collaborate on projects and share information using the Internet after it was decided that existing tools weren't adequate.
Berners-Lee decided to use a hypertextual model, and then set out to solve a number of problems posed by that model.
In any hypertext system you need a way to point to information objects so you can ``carry'' the pointer instead of the object.
The Uniform Resource Identifier (URI) specification, a general specification that makes it possible to point to any document, anywhere.
The URI specification ``defines a way to encapsulate a name in any registered namespace, and label it with the namespace, producing a member of the universal set.''
In other words, the URI specification defines a superset to all existing and possible namespaces. Any namespace can be given a label and incorporated into the URI space.
URIs consist of two parts:
The extensibility requirement is met by the ability to register new unique prefixes. The completeness requirement is met by the ability to encode any binary information in the string following the prefix (in Base64, for instance). The printability requirement is left to the implementation of specific namespace encodings.
Reserving /, . and .. allowed the specification of relative URIs, which work much like relative paths in a filesystem. When a relative URI is found the URI of the containing document is used as a reference to construct a new full URI following these semantics:
Note that using the parent URI http://www/b/c//d/e/ would yield the same results.
Now that a we have the URI specification, we need to be able to point to existing documents available on the Web.
There is a working group of the IETF attempting to define a Uniform Resource Name specification. URNs are meant to be persistent objects regardless of how machine and server configurations are changed. URNs solve the same problem for URLs as DNS solves for IP numbers.
Now that we have pointers to document objects, we need a place to put them.
Design features of HTML:
HTML is beyond the scope of the talk.
The original Web browsers used the extension of a file to determine its type. This method had several disadvantages:
To fix this problem parts of the existing MIME (Multipurpose Internet Mail Extensions) system was integrated into Web clients and servers.
Before a document is transmitted it is assigned a MIME type by the server or mailer. This assignment is often made based on file extension, but because the assignment is made locally the user can make sure the appropriate type is defined. The MIME type is a description of the contents of the file.
When the file is received, the browser uses the MIME type to find an appropriate viewer for the file.
MIME features:
How to transfer documents from the author to the user.
The Hypertext Transfer Protocol (HTTP).
Any simple summary of the features of HTTP would ignore the serious changes its role precipitated by other changes in other WWW tools. A chronological summary of the changes in HTTP features is more interesting.
The first version of HTTP to be distributed widely was 0.9. The only request that could be made was ``GET (url)'', where ``url'' is an HTTP URL with the prefix stripped. The document pointed to by the URL would be returned to the browser.
HTTP 0.9 was designed to deliver documents with the lowest amount of overhead as possible. FTP can perform the same function, but it requires a costly login process. HTTP is a stateless protocol. Berners-Lee saw that a document would be transferred and read, and then a link would be followed to another document, possibly not on the same server. There was no advantage to keeping a socket open.
The next version of HTTP was designed to fix a number of problems with the previous versions and add new features. The major change was the addition of document typing using MIME-related headers. In addition other Methods were included in addition to the GET method. Some of these were:
The most important of these methods is PUT, which is used in conjunction with the Common Gateway Interface.
Forms: A specification for creating a fill-out form within an HTML document. Each browser that implements Forms is responsible for packing the information into a special format when a form is submitted and sending it to a specified URL.
CGI: A specification for a script on an HTTP server that has its own URL. When the URL is accessed, the script is run and its output is sent to the client. Used in conjunction with Forms, a set of scripts can carry on a "dialogue" with a client.
Interesting note: Because HTTP is stateless, CGI scripts often have to play tricks to ensure that the state of a conversation is stored in the document returned to a client.
During the development of Mosaic, one of the programmers (Marc Andreesen) decided he wanted to add support for displaying pictures inside of documents. As with every decision made by Andreesen and the new Netscape Communications company since, he designed a quick-and-dirty solution that served his needs and caused significant problems he could blame on other people.
Rather than find a way of encapsulating a picture with a document he decided on the most general model, which was to have the browser perform an additional request for each picture. This changed the model that Berners-Lee had originally envisioned and created performance problems caused by the overhead of forming a TCP socket.
There are two proposed solutions to the problem of inlined documents:
Other additions include support for more advanced applications, and for encryption of sensitive data.
Care is being taken to ensure that the protocol will be extensible.