Open Computing "Hands-On" Tutorial: October 1994 Making Web Browsers Talk Back The World Wide Web, accessed through forms-capable browsers like Netscape and Mosaic, can be used for two-way communication. Here's how to create forms and scripts for collecting information. By Patrick M. Ryan The World Wide Web (WWW) is an excellent tool not only for retrieving information from remote sites but also for allowing you to interact with sites in a way similar to transaction processing. Several Web browsers, most notably Netscape and Mosaic, let you enter information into the browser and have that information sent back to a server. The resources most commonly accessed through WWW are documents written using the Hypertext Markup Language (HTML). With HTML, portions of a document can be treated as hyperlinks (or references) to other Web resources. These elements, which are often textual and sometimes graphical, appear as highlighted objects when viewed in a browser such as Netscape. If you click a mouse or press a key related to one of these highlighted objects, the browser goes out to the net and retrieves that hyperlink. HTML documents may be retrieved from machines that are running the Hypertext Transfer Protocol (HTTP) daemon. This daemon (HTTPD) listens to a certain port (default 80) for requests for documents within a certain domain on the host system. Mosaic and HTTPD are both products of the National Center for Supercomputing Applications (NCSA). (See the June 1994 Open Computing ``Hands-On'' section tutorial article, ``Riding the Internet Wave'' for setting up Mosaic.) HTML also allows the reader to enter information into the HTML document and have that information passed back to the server machine's HTTP daemon. These types of HTML documents are called forms. The method for passing the information back and processing that information is called the Common Gateway Interface (CGI). Associated with an HTML form is a CGI program or script. The CGI specification describes what CGI programs can expect from standard input, what they should send to standard output, what environment variables they can use, and what may appear on the command line. Nearly all current browsers support forms. Netscape Navigator and versions of Mosaic later than 2.0 support forms. The research for this tutorial used Mosaic for X version 2.4 and httpd 1.1. There are many things to learn about implementing HTML documents, but some important items you should learn about include configuring Web server (HTTP daemon program), how to write HTML forms, how the server and CGI program interact, and what security measures to keep in mind. Server Configuration Implementing a form system using HTML requires you to write two files: an HTML document and a CGI program to process the input from the form. Both Listing 1 and Listing 2 demonstrate a simple product-ordering system used by the prolific and fictitious Yoyodyne Corp. We assume that their HTTP server resides on www.yoyodyne.com. Configuration of the Web server daemon is a straightforward but long process and is a subject worthy of an article all to itself. The URL (Uniform Resource Locator) for NCSA's excellent documentation on HTTPD configuration can be found at the end of this article. Once you have an operational HTTP daemon on your system, familiarize yourself with the directory structure of the server. The server has a directive named ServerRoot that points to the top of the HTTP daemon's directory tree (often /usr/local/etc/httpd/). The server-root directory has several subdirectories, including conf/, icons/, logs/, and cgi-bin/. The conf/ directory contains the server's configuration files. In those files, most directory references are relative to the value of ServerRoot. Look at the file conf/srm.conf (the server resource map file). The variable ScriptAlias indicates where CGI scripts reside. The first argument to ScriptAlias is an alias name (for the actual path name) that HTML forms must use to refer to their associated CGI programs. The second argument is the real path on the system where CGI scripts live. For security reasons, any attempt to reference a CGI program outside that alias directory will generate an error from the server. We need to know the actual location of that directory so that we know where to locate our CGI program. Form Syntax Forms are set up in an HTML document using a FORM tag. The syntax is
[form text]
. The ACTION attribute parameter is a URL that points to the form's CGI program. Usually but not always, this CGI program resides on the same machine as the HTML document. The METHOD parameter will have a value of ``POST'' or ``GET.'' This parameter indicates how the request will be transmitted to the server. In nearly all cases, it will be ``POST.'' When using the POST method, the client sends the query data as an Object-Body. The CGI program reads the data on its standard input. As mentioned before, CGI scripts must reside in the directory pointed to by the ScriptAlias parameter. A typical value for the alias directory name is /cgi-bin/. The URL that points to a CGI program process_order would be "/cgi-bin/process_order". Note that this URL does not have any protocol or host information. In the absence of such information, the Web server will look for the CGI script on the same host where the form resides. As with CGI scripts, the locations of all resources--accessed through the Web server daemon, the documents served, or otherwise --are restricted. The DocumentRoot directive points to the top-level directory where these resources may be accessed. (The default value for DocumentRoot is /usr/local/etc/httpd/htdocs/.) For example, if you have a URL that points to http://www.yoyodyne.com/products/order.html, the HTTP daemon on www.yoyodyne.com will translate this path into /usr/local/etc/httpd/htdocs/products/order.html. However, if the first part of the URL file path has the form ~user/, the server consults the value of the UserDir server configuration directive. If this directive has a directory name value, the server will look for user-account home directory, append the value of UserDir, and look in that directory for the reference. For instance, if UserDir is set to public_html and you have a reference to http://www.yoyodyne.com/~dave/abstract.html, the server will translate this path to ~dave/public_html/abstract.html on www.yoyodyne.com. The administrator can set UserDir to ``DISABLED'' to defeat this feature. HTML Buttons An HTML form can make use of three different types of interface elements or tags: INPUT, SELECT, and TEXTAREA. The general form of an INPUT tag is . The INPUT element is a ``standalone'' tag; it has no terminating tag. NAME defines the symbolic name for the field value passed back to the server upon submission and must be present for all but TYPE="submit" or TYPE="reset". The value for NAME does not appear in the displayed document. Usually, any text immediately before or after the INPUT tag serves as a label for the tag. The TYPE attribute to the INPUT tag indicates which type of input you want: text Textual input. password Same as text but does not echo characters. checkbox A button that is either on or off. radio A ``one-of-many'' checkbox if multiple radio buttons are grouped with the same NAME. reset Resets form values to their defaults. submit Sends form information back to the server. The text and password input TYPES values may contain an optional SIZE=columns,rows attribute that indicates the number of columns (characters) and rows displayed for text input. Checkboxes and radio buttons may have an optional CHECKED attribute to specify a pre-checked value. The submit and reset input TYPE values are special. If a user presses the ``Reset'' button, all of the inputs are set back to their default values. Pressing the ``Submit'' button will cause the browser to package up the data entered by the user and send it back to the server. These two values have an optional attribute VALUE=button-label, which, if present, will be used as the button label. The SELECT interface tag allows the user to choose from a list of items in a pop-up menu or scrollable list. The selection items are enclosed between the opening tag . Each choice in the list begins with an