Note. In writing these notes, I have, probably unconsciously, assumed a familiarity with the Unix environment and the C programming language. Please note that C is not the same as C++, it is a much simpler non-object oriented programming language.
Introduction
The normal operation of a WWW server when processing a URL is to simply deliver the indicated file to the remote browser. If the server is suitably configured, however, it will under certain circumstances, execute the file as a program and deliver, to the remote browser, whatever the program writes to its standard output. The program may do whatever its author wished, such as interrogating a database and constructing HTML based on the information found in the database.
On the clun.scit.wlv.ac.uk WWW server files will be executed under the following circumstances
The files are located in the directory /usr/local/ftp/httpd/cgi-bin
The files have a name ending with the character string .cgi
You are encouraged to examine the files in the directories /usr/local/ftp/httpd/cgi-bin and /usr/local/ftp/httpd/cgi-src on clun.scit.wlv.ac.uk for examples of how to write such programs in C.
Basic Programs
Introduction
Any program that is going to write output to be sent to a remote HTML interpreter or browser must ensure that the first line of output is
Content-type: text/html
or
Content-type: text/plain
with obvious meanings. It is important that the first line must be exactly as shown above, including spaces and case. It is equally important that this first line must be followed by a blank line.
Failure to observe these requirements will result in various obscure error messages.
Note. What is actually happening here is that your program is communicating with the remote browser using the Hypertext Transfer Protocol (HTTP).i [See for details.] The Content-type: line indicates to the browser the mechanism it should use to process the rest of the information, there are many other possibilities.
If you are using the clun.scit.wlv.ac.uk WWW server, your executed program is a Unix process and there are a number of environment variables available to the process. These are set by the WWW server before it starts your process.
You can get a copy of the program and save it in your public_html directory. Make sure you have set the access rights correctly using
go+rx hw.cgi
You can make the WWW server execute the program by pointing your WWW browser to http://www.scit.wlv.ac.uk/~UID/hw.cgi where UID is, of course, replaced by your login code.
Another Unix shell script
Here is another example that you can get a copy of. This is another shell script that shows the values of various shell environment variables set by the WWW server before starting the program.
If you write a C program and then compile it into an executable file, you can, of course, access these environment variables using the library function.
main()
{
printf("Content-type: text/html\n\n");
printf("<html><head><title>");
printf("Hello World Again");
printf("</title></head><body>");
printf("<h1>I'm a C Program</h1>\n");
printf("</body></ht\n");
}
Compile the program using the command
gcc -o hwcgi.cgi hwcgi.c
and make it world executable using the command
go+rx hwcgi.cgi
The program can then be executed in the way shown above.
Other languages
You can write a CGI back end program in any language that is capable of determining the values of the Unix environment variables and, possibly, accessing the Unix command line arguments. The back end program will, of course, have to be developed to function on the server machine.
CGI and forms
The commonest use of the CGI mechanism is to provide a back end for HTML "forms". Before studying this further you need to be familiar with the relevant parts of HTML.
Basically a form is a series of elements enclosed within the tags <form> and </form>. Once the various boxes within the form have been filled in and the "submit" button has been hit the user entered values are transmitted to the server where a CGI back end can process the information and generate whatever reply is required.
A <form> tag has two significant attributes, METHOD and ACTION.
For simple use the METHOD attribute can be set to either GET or POST. This attribute controls the method by which the user entered data is communicated to the back end program.
If the GET method is selected then all the data is presented to the backend program in the environment variable QUERY_STRING.
If the PUT method is selected then all the data is presented to the backend program on its standard input.
Many authors suggest that the PUT method is preferable but the only reasons for selecting it in preference to GET are
If your host system (the one the backend runs on) restricts the size of an environment variable. This may well be a problem under MSDOS but is unlikely to cause any trouble if you are using Unix.
If you are concerned that the WWW server maintains a publically accessible log of requests. GET requests can be inferred from such a log, PUT requests cannot.
The ACTION specifies what is to be done when the form-filling is complete. Its value will be simply a URL, if this is the name of an executable file then the file will be executed, however it may simply refer to a WWW page, however this is unusual.
The information from a WWW browser that has processed a form is partially encoded using characters such as + to represent spaces. There are some specimen programs called get-query.c and post-query.c in the directory /usr/local/ftp/httpd/cgi-src on clun.scit.wlv.ac.uk that demonstrate the decoding of forms information. The utility routines associated with these programs will be found in util.c in the same directory. Take a look at these and use them as a model for your own forms handling back-ends.
Here are some examples of form input using the , and tags. The various attributes and their effects are discussed.
CGI and image maps
Imagemaps are a widely used and useful feature of the WWW. They operate by the user clicking on a particular position on the map, the co-ordinates of this position are determined by the browser. There are three possible mechanisms that can then be used.
A client side map. The HTML seen will include definitions of active areas and the associated URLs.
A server side map using configuration files. This is the traditional mechanism and requires the construction of configuration files defining the active areas on the server.
A server side map using the CGI mechanism. This is a generalisation of the second method.
To use a map the HTML should look like this
<a href="map.cgi" ><img src="map.gif" ismap> </a>
The ismap attribute associated with the img tag indicates that the browser should determine the co-ordinates
The executable map.cgi will be invoked with the x,y coordinates on the command line as a single argument with a comma separating the two values.
It is, of course, the repsonsibility of the executed program to process the data. The commonest usage is to use the HTTP Location: response to tell the browser to go somewhere else.
Here's a complete example using the CGI generalisation. Before studying the code, follow to see what it does. First the HTML.
<html><head><title>Map Test</title>
</head>
<body>
<h1>Map Test</h1>
<p>Please click on the image</p>
<a href="http://www.scit.wlv.ac.uk/~jphb/scarf/map1.cgi">
<img src="http://www.scit.wlv.ac.uk/~jphb/scarf/mapx.gif" ismap>
</a>
</body>
</html>
and here's the C code for the program map1.c that was compiled to give the program map1.cgi
You can get your own copy of the program map1.c. The coordinates are, of course, related to the image are the programmer will have to use a utility such as xv to examine the image and determine the co-ordinates.
CGI and access rights
When your program is run via the CGI mechanism it has, of course, been started up by the WWW server. This, normally, means that the process in question is owned by a special user called nobody. You cannot login as nobody and there is no home directory for this user.
The implication is that all files that you want your program to access should be accessible to the user nobody, this means, in practice, that they should be world accessible. For data files that are read by your program this is unlikely to be a problem as the data is, presumably, publically available anyway. However if you intend to log all queries or otherwise record information gathered via the WWW, it means that all the files in question should be world-writable, this is definitely undesirable. Fortunately the Unix operating system offers a simple solution by making your executable setuid, this means that when it is launched by the WWW server it will be owned by you not nobody.
A binary executable can be made setuid using the command with the parameters u+s,go+rx
A further problem arises on some Unix systems when CGI applications are built by compiling programs in C or similar languages. Such applications are usually dynamically linked which means that code from libraries is not linked in to the applications until the applications actually run. This is the normal configuration, however for security reasons a dyanmically linked setuid program will only operate correctly (for users other than the owner) if it only dynamically links to standard libraries (i.e. those in /usr/lib).
There are two ways round this problem, the first is to statically link the application, the second is to launch the application via a wrapper that sets both the real and effective user-id. The wrapper itself must, of course, be setuid.