INTERNET-DRAFT Ari Luotonen Expires: June 15, 1996 Netscape Communications Corporation John Franks Northwestern University December 15, 1995 Byte Range Extension to HTTP STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). TABLE OF CONTENTS 1. Overview ................................................. 2 2. Accept-Ranges HTTP response header ....................... 3 3. Byte range HTTP request .................................. 3 3.1. The Range HTTP request header ............................ 3 3.2. The Unless-Modified-Since HTTP request header............. 5 4. Byte range HTTP response ................................. 6 4.1. 206 Partial Content status code .......................... 6 4.2. The Content-Range HTTP response header ................... 7 4.3. Multiple ranges as multipart MIME messages ............... 8 4.4. Caching issues ........................................... 8 5. Future considerations .................................... 9 5.1. Extending Accept-Ranges, Range and Content-Range headers . 9 5.2. Other possible ranges .................................... 9 Luotonen, Franks [Page 1] BYTE RANGE EXTENSION TO HTTP INTERNET-DRAFT December 1995 6. References .............................................. 10 7. Authors' Addresses ...................................... 10 1. OVERVIEW It is possible in Web clients to interrupt the connection before the data transfer has finished. As a result the client may have partial documents or images loaded into its memory. It would be useful to be able to request the server to return the missing portion of the document only, instead of retransferring the entire file, if the page is re-entered later. There are also a number of Web applications that would benefit from being able to request the server to give a byte range of a document. As an example an Adobe PDF viewer would need to be able to access individual pages by byte range; the table that defines those ranges is located at the end of the PDF file (this is the case in the new to-be-released PDF format). Setting this standard will promote interoperability between clients, servers and intermediate proxy servers, make (partial) caching effective, and save bandwidth. This specification defines only the byte ranges. It shows other types of ranges as an example of how this specification could be extended, as proof of its generality. Those examples should not be viewed as their definition. This specification is simple enough to be adopted quickly by the server authors/vendors, and be quickly and easily exploited on the client side. The proposed solution will be backward compatible with existing proxy servers, and once this specification becomes official it will actually be possible to support this in a smart way in proxy servers. This specification can be applied to document types for which byte ranges make sense; there are types for which they don't, and this specification is not trying to enforce semantics for byte ranges for them. In practice most of the data in the Web is represented as a byte stream, and can be addressed with a byte range to retrieve a desired portion of it. This is especially useful when there is a partial copy of the document, the transfer of which was interrupted by the user, but later resumed, in which case only the missing portion needs to be transferred. Byte range requests are typically generated by software, not written Luotonen, Franks [Page 2] BYTE RANGE EXTENSION TO HTTP INTERNET-DRAFT December 1995 by humans. 2. ACCEPT-RANGES HTTP RESPONSE HEADER The server needs to let the client know that it can support byte ranges. This is done through the Accept-Ranges HTTP header when a server is returning a document that supports byte ranges: Accept-Ranges: bytes The server will send this header only for documents for which it will be able to satisfy the byte range request, e.g. for PDF documents, or images, which can be partially reloaded if the user interrupts the page load, and image gets only partially cached. Because of the architecture of the byte range request and response, the client is not limited to attempting to use byte ranges only when this header is present. The Request-Range header is simply ignored by a server that does not support it, and it will send the entire document as a response. 3. BYTE RANGE HTTP REQUEST Byte range request is made like any other HTTP request, with the addition of the Range: HTTP Request header. 3.1. The Range HTTP Request Header The client requests a byte range via the Range: HTTP header: Range: bytes=0-500,5000- The Range: header is defined extensibly so that it can take a generic parameter specifying the type of range. The parameter name for byte ranges is "bytes". The syntax of this parameter is described below. The name of the byte range parameter is bytes. It is passed to the server in the Range: HTTP request header, followed by an equal sign and the byte range specification. (In an earlier version of this draft, it was passed to the server appended to the end of the path part of the URL, separated by a semicolon). Note About CGI Applications Luotonen, Franks [Page 3] BYTE RANGE EXTENSION TO HTTP INTERNET-DRAFT December 1995 As defined by the CGI/1.1 specification, the value of the Range: header will be passed to CGI scripts in the HTTP_RANGE environment variable. The CGI applications can choose to support it if they so desire, and if it is possible. If the CGI applications do not support it, or if the content they return changes from call to call, they simply ignore the presence of that header, and return the entire document. Each range consists of one or two non-negative integers, separated by a hyphen. The first integer must always be less than or equal to the second one. One of these integers may be missing, but not both at the same time. The hyphen is always there, so it is possible to tell which number is missing. If the first number is missing, it means to return the n last bytes of the document, where n is the second number. If n is equal to, or larger than, the size of the document minus one, then the entire file is returned. If the second number is missing, it means the end of document. That is, all the bytes starting from byte n until the end of the document, where n is the first number. The first byte in a document is byte number 0. If the second number is larger than the size of the document minus one, it is taken to mean the size of the document minus one (that is, the end of the document). The range is inclusive; as an example, the range 500-1000 includes bytes from 500 to 1000, including 500 and 1000. There may be multiple ranges, separated by a comma. The order of the ranges is the preferred order in which the ranges should be returned. In the case that the second integer is smaller than the first one, that particular range is tagged as invalid, and ignored. If it was the only requested byte range, the entire document is returned. Otherwise the remaining valid ranges will be returned. The byte ranges refer to ranges in data as they are transferred over the network (and retrieved by the client). E.g. if in an imaginary system the server stores all lines terminated by CR LF, but turns them into a single LF before sending the data, then byte ranges refer to ranges inside this modified data (the one with single LF line separators). That is, the ranges refer to the data that the client would see. Luotonen, Franks [Page 4] BYTE RANGE EXTENSION TO HTTP INTERNET-DRAFT December 1995 The byte ranges apply to the "raw" data, that is, the data encoded by Content-encoding; but not to the "armored" data, that is, the data encoded by content-transfer-encoding. Examples of the Range: header with the bytes parameter The first 500 bytes: Range: bytes=0-499 The second 500 bytes: Range: bytes=500-999 All bytes except for the first 500 until the end of document: Range: bytes=500- The last 500 bytes of the document: Range: bytes=-500 Two separate ranges: Range: bytes=50-99,200-249 The first 100 bytes, 1000 bytes starting from the byte number 500, and the remainder of the document starting from byte number 4000 (byte numbering starts from zero): Range: bytes=0-99,500-1499,4000- The first 100 bytes, 1000 bytes starting from the byte number 500, and the last 200 bytes of the document: Range: bytes=0-99,500-1499,-200 3.2. The Unless-Modified-Since HTTP request header Guaranteeing that individual parts are all up-to-date and in sync with each other is crucial. This can be made easier by providing a way to tell the server to send the byte range only if it hasn't changed since the time of the retrieval of the other ranges. If it has, the entire document is transferred instead. The Unless-Modified-Since header will be sent by the client to the Luotonen, Franks [Page 5] BYTE RANGE EXTENSION TO HTTP INTERNET-DRAFT December 1995 server (or the proxy), carrying the date and time received in the Last-Modified header from the previously received parts. If at any point the last-modified date or time mismatch is detected by the client, the older parts should be discarded. The last-modified date and time must match exactly. The server will send the requested byte range (as a 206 Partial Content response, as described below) if and only if the document has not changed since that date and time. If it has, the server will send the entire document to the client instead (as a normal 200 response). Example: Unless-Modified-Since: Wed, 15 Nov 1995 06:25:15 GMT As for Last-Modified-Since header in practice, there may be additional parameters in the end of this field, separated by a semicolon, to make additional checksums possible. The most basic one is the size of the file as the length parameter: Unless-Modified-Since: Wed, 15 Nov 1995 06:25:15 GMT; length=12045 4. BYTE RANGE HTTP RESPONSE 4.1. 206 Partial Content Status Code The byte range response uses the 206 Partial Content HTTP response status. Servers and CGI applications not supporting byte ranges will simply ignore the Range: header in the request, and return the entire document in a 200 response. Existing proxy servers only cache 200 Ok responses. This way intermediate proxy servers will not mistakenly cache a partial document as if it was the entire document. If the request includes multiple ranges, the response is sent back as a multipart MIME message, with content-type multipart/x-byteranges. A server may, but is not required to, send also a single byte range as a multipart message. If there are overlapping ranges the behavior for each range doesn't change. That is, a range will not be truncated, merged, or left out, just because there is an overlap. If there was an Unless-Modified-Since header in the request, and the document was modified since that time, the server will send a normal Luotonen, Franks [Page 6] BYTE RANGE EXTENSION TO HTTP INTERNET-DRAFT December 1995 200 Ok response, and transfer the entire document instead. 4.2 The Content-Range HTTP Response Header The Content-Range HTTP response header is sent back to provide verification and information about the range and total size of the document. This header can be used by the client to determine which one of the requested ranges is in question. Syntax: Content-Range: bytes X-Y/Z where: X is the number of the first byte returned (the first byte is byte number zero). Y is the number of the last byte returned (in case of the end of the document this is one smaller than the size of the document in bytes). Z is the total size of the document in bytes. Examples of the Content-Range: HTTP Response Header The first 500 bytes of a 1234 byte document: Content-Range: bytes 0-499/1234 The second 500 bytes of the same document: Content-Range: bytes 500-999/1234 All bytes until the end of document, except for the first 500 bytes: Content-Range: bytes 500-1233/1234 The last 500 bytes of the same document: Content-Range: bytes 734-1233/1234 Example of a response: HTTP/1.0 206 Partial content Server: Netscape-Communications/2.0 Date: Wed, 15 Nov 1995 06:25:24 GMT Luotonen, Franks [Page 7] BYTE RANGE EXTENSION TO HTTP INTERNET-DRAFT December 1995 Last-modified: Wed, 15 Nov 1995 04:58:08 GMT Content-range: 21010-47021/47022 Content-length: 26011 Content-type: image/gif 4.3. Multiple Ranges as Multipart MIME Messages Multipart MIME is defined in [RFC-1521]. With byteranges, the multipart MIME message uses content-type multipart/x-byteranges, with a boundary parameter. Example: Content-type: multipart/x-byteranges; boundary=THIS_STRING_SEPARATES --THIS_STRING_SEPARATES Content-type: application/x-pdf Content-range: bytes 500-999/8000 ...the first range... --THIS_STRING_SEPARATES Content-type: application/x-pdf Content-range: bytes 7000-7999/8000 ...the second range... --THIS_STRING_SEPARATES-- 4.4. Caching Issues The server must give Last-modified headers for each range request whenever possible, and the client side must take care of having all the fragments in sync. Conditional GET (the GET request with the If- modified-since header) works as expected with byte ranges. That is, the requested range is returned if the document has been modified since the given date. Otherwise, a 304 Not Modified response is sent. Ranges can be cached, and if the Last-modified header matches they can be combined. If a received Last-modified date at any time differs from the ones in the cache, all the cached ranges will be discarded. The client side should monitor the Last-modified header value returned by the server, and make sure that all of its individual fragments are in sync. If there are older ones they should be Luotonen, Franks [Page 8] BYTE RANGE EXTENSION TO HTTP INTERNET-DRAFT December 1995 immediately discarded and re-retrieved. 5. FUTURE CONSIDERATIONS 5.1. Extending Accept-Ranges, Range and Content-Range headers If at some point there will be additional parameters for the Range: header, they should be separated by the semicolon character. Example: Range: param1=bar; param2=xyzzy This specification does not define semantics for cases with multiple Range: parameters. Future specifications should define semantics for these. Until then, Range: headers with parameters that cannot be understood should be ignored. 5.2. Other Possible Ranges There are other kinds of ranges that can be addressed in a similar fashion; this document does not define them, but both the Range: HTTP request header and the Content-Range: HTTP header are defined so that it is possible to extend them. As an example, there might be a "lines" parameter, with the same kind of range specification, and the Content-Range: header would then specify the numbers in lines. Example: GET /dir/foo HTTP/1.0 Range: lines=20-30 The response from a 123 line document would be: HTTP/1.0 206 Partial Content Content-Range: lines 20-30/123 Last-Modified: ... Content-Length: 773 Content-Type: text/plain This could be useful for such things as structured text files like address lists or digests of mail and news, but isn't meaningful to such document types as GIF or PDF. Other examples might be document format specific ranges, such as chapters: Luotonen, Franks [Page 9] BYTE RANGE EXTENSION TO HTTP INTERNET-DRAFT December 1995 GET /dir/foo HTTP/1.0 Range: chapters=6-9 206 Partial Content Content-Range: chapters 6-9/12 Last-Modified: ... Content-Length: 36023 Content-Type: application/x-book-type 6. References [RFC-1521] N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail Extensions), Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, September 1993 [HTTP] T. Berners-Lee, R. Fielding, H. Frystyk, "Hypertext Transfer Protocol -- HTTP/1.0", draft-ietf-http-v10-spec-04.html, October 14, 1995. [CGI] R. McCool et al, "Common Gateway Interface -- CGI/1.1", http://hoohoo.ncsa.uiuc.edu/cgi/, NCSA, 1994. 7. Authors' Addresses: Ari Luotonen Netscape Communications Corporation 501 E. Middlefield Road Mountain View, CA 94043 USA John Franks Department of Mathematics Northwestern University Evanston, IL 60208-2730 Luotonen, Franks [Page 10]