Date: Tue, 10 Dec 1996 22:42:47 GMT Server: NCSA/1.4.2 Content-type: text/html Last-modified: Tue, 04 Oct 1994 03:39:12 GMT Content-length: 7901 Guidelines For Writing HTTP Server Scripts

Guidelines For Writing HTTP Server Scripts


This document is intended to be an evolving series of ideas, pointers and other information about writing programs that can be executed by following a WWW link, with particular emphasis on security issues.

The rest of this document assumes that you already know how to write programs. It also doesn't attempt to cover the same ground as NCSA's introduction to forms, which should be be considered the starting point for explorations of forms. You should also be sure to read the Common Gateway Interface documentation as well, which describes the interface that the HTTP server defines between HTML messages and server-side executables.

Security

The potential problems with security cannot be overemphasized. Unlike existing network protocols, which generally allow either: the existence of server-side scripts effectively permits although clearly, the scope of "abitrary" in this case is at least somewhat reduced (specifically: arbitrary users can execute any program installed in a ScriptAlias location prior to the last HTTP server restart).

There are some basic features of server-side scripts that if used correctly will minimize the potential for security problems:

The Basic Idea

When you follow a link using a URL of the form:

	http://foo.bar.baz/a/b/c
the HTTP server at foo.bar.baz will check each successively longer substring of /a/b/c (ie. /a, /a/b, etc.) against the list of "ScriptAliases" defined in the server's configuration files. A ScriptAlias looks like this:
ScriptAlias /a /some/other/place/in/the/filesystem/a
which the server interprets to mean: if anyone ever references /a/something, then execute /some/other/place/in/the/filesystem/a and return its output. Note that this implies two things about the executed program: it must send a MIME Content-Type header as its first line of output, to tell the client (Mosaic) what the output actually is (HTML ? GIF ? JPEG ? etc), and then it should send some "useful" output, even if its only an "OK, message received" line. See the mail-request mentioned below for an example of how to do this.

Only programs located in places referenced by a ScriptAlias will ever be executed by the server. In addition, the server caches a directory listing of all the programs in each location referenced in a ScriptAlias whenever it is started (or restarted), and uses it to check possible server-side programs before executing them. This prevents random programs placed in the right place from being accessed without a server restart (which only a priviledged user can do).

What about arguments ? What about input ?

Once the HTTP server has discovered that /a/b is actually an executable program in a ScriptAlias location, it executes the program, passing it data in two ways. First of all, any text left over from the URL that has not been "used" to find the script will be used to set the value of an environment variable named PATH_INFO. In the example above, this would relatively simple: PATH_INFO would just be /c. However, near arbitrary text can be used here:

http:/foo.bar.baz/a/b/long=4748.39?//limit:=$!!:h+aposto:*&%&^$$#{fhfh}
This will result in PATH_INFO being set to:

/long=4748.39?//limit:=$!!:h+aposto:*&%&^$$#{fhfh}
(note the initial `/'). The main restriction is that spaces are not allowed, or rather, will terminate the component of the URL used to set PATH_INFO.

In addition, if you are using a forms interface, the values of all the <input> and <select> tags in the form will be made available, as the standard input of the program.

Encoding

This will be encoded to guarantee safe transmission. This encoding is an important issue, because to make reasonable use of the data sent to your program, you need to decode it. Fortunately, you don't *have* to do this yourself. For the time being, a filter called urldecode can be used by your own programs (easily if they are an actual shell,awk or perl script) to do the decoding. Invoke it as:
/cse/www/htbin-post/urldecode
and it will convert any encoded data read from its standard input into its original form on its standard output. At some point, I'll add a object module you can link with to do this from a compiled langauge like C (although you may get there before me, since the encoding is so simple).

More details are available about writing server scripts in the Common Gateway Interface documentation, where a number of other environment variables that are available to the program are described.

How to do this locally

Currently, only two areas have been named using ScriptAliases. One of them is not currently publically accessible (the filesystem it resides on is not exported to the rest of the department's machines). The other is intended for use by participants in the 590i seminar taking place this quarter:
	/projects/ai/590i/post-bin
An example program, called mail-request, is already there (its a shell/awk script). This is the program I use for the interface to my music collection, so take a look at the HTML source for that stuff to see how this is used. I intended it to be usable by anyone else, and a for a variety of purposes. Suggestions are welcome.

The HTTP daemon will be restarted about 4 times a day, and on the next restart after you have placed a program there, you will able to have a link to it result in its execution. After that, you can keep changing the program in any way you see fit, and the daemon won't care - it merely notes prescence or absence.

I want to reiterate that this is potentially a big security issue. Please take care in how you handle arguments, how to handle input and what your program does or might do.

For the time being, all instances of these programs will run as the uid "nobody". Also access to both areas (the private one and the 590i area) is currently limited to machines in the .cs.washington.edu domain. This restriction is inconvenient, and intended to create a temporary breathing space so that we can get more experience with potential security issues.

webmaster@cs.washington.edu