![]() |
Chapter 10: Sites That Are Really Programs![]() |
You needn't turn your Web site into a program just because the body of material that you are publishing is changing. Sites like http://www.yahoo.com, for example, are sets of static files that are periodically generated by programs grinding through a dynamic database. With this sort of arrangement, the site inevitably lags behind the database but you can handle millions of hits a day without a major investment in computer hardware, custom software, or thought.
If you want to make a collaborative site, however, then at least some of your Web pages will have to be computer programs. Pages that process user submissions have to add user-supplied data to your Web server's disk. Pages that display user submissions have to look through a database on your server before delivering the relevant contributions.
Even if you want to publish completely static, non-collaborative material, at least one portion of your site will require server-side programming: the search engine. To provide a full-text search over your material, your server must be able to take a query string from the user, compare it to the files on the disk, and then return a page of links to relevant documents.
This chapter discusses the options available to Web publishers who need to write program-backed pages. Here are the steps:
Every interesting Web site has some characteristics of both a document and a computer program. There is thus no correct answer to the question "Is your site a hypertext document with bits of computation or a computer program with bits of static text?" However, the tools that make it easy for a team of experts to develop a computer program will get in the way if your site is fundamentally a document. Conversely, the tools that make it convenient to edit a document can lead to sloppy and error-filled computer programs.
Server-side programming systems that take the document model to its
logical extreme are AOLserver Dynamic Pages (ADP) and Microsoft Active
Server Pages (ASP). A vanilla HTML file is a legal ADP or ASP document.
If you want to add some computation, you weave in little computer
language fragments, surrounded by <% ... %>
. If you
want to fix a typo or a programming bug, you edit the .adp or .asp file
and hit reload in your Web browser to see the new version. Almost
always, the connection is direct and immediate between the URL where the
problem was observed and the file on the server that you must edit. You
don't have to understand much of the document's structure to fix a bug.
At the other end of the document/program spectrum are various "application servers" that require you to program in C or Java. HTML text is inevitably buried inside these programs. Fixing a typo requires editing the program, compiling the program, and reloading the compiled code into the Web or application server. If there is a problem with a URL, fixing it might require reading and editing dozens of program files and understanding most of the program's overall structure.
With the right tools and programmer resources, you can build a jewel-like software system to sit behind a Web site. But ask yourself whether the entire service isn't likely to be redesigned after six months, and if, realistically, your site isn't going to be thrown together hastily by overworked programmers. If so, perhaps it will be best to look for the tightest development cycle.
Consider these aspects:
How could a lame scripting language like Tcl possibly compete with Lisp? At some level, the only data type available in Tcl is a string. Well, guess what? The only data type that you can write to a Netscape browser is a string. And all the information from the Oracle relational database management system on which you are relying comes back to you as strings. So maybe it doesn't matter whether your scripting language has an enfeebled type system.
Are these languages really the best? My computer science friends would shoot me for saying that Tcl is as good as Common Lisp and better than Java. But it turns out to be almost true. Tcl is better than Java because Tcl doesn't have to be compiled. Tcl can be better than Lisp because string manipulation is simpler. For example, in Tcl
will generate a string from the fragments of static ASCII above plus the contents of the variables"posted by $email on $posting_date."
$email
and
$posting_date
. These were presumably recently pulled from
a relational database. The result might look something like
In Common Lisp, you'd have"posted by philg@mit.edu on February 15, 1998."
which uses a fabulously general mechanism for concatenating sequences.(concatenate 'string "posted by " email " on " posting-date ".")
concatenate
can work on sequences of ASCII characters
(strings) or sequences of TCP packets or sequences of three-dimensional
arrays or sequences of double-precision complex numbers. Sequences can
either be lists (fast to modify) or vectors (fast to retrieve). This
kind of flexibility, which Java apes, is wonderful except that Web
programmers are concatenating strings 99.99 percent of the time and
Tcl's syntactic shortcuts make code easier to read and more reliable.
What's my prediction for the powerful language that will sit behind the Web sites of the future?
HTML.
HTML? But didn't we spend a whole chapter talking about how deficient it was even as a formatting language? How can HTML function as a server-side programming language?
<!--#include FILE="/web/author-info.txt" -->
and then reloaded the file in a browser.
Nothing changed. Anything surrounded by "<!--"
and
"-->"
is an HTML comment. The browser ignores it.
Your intent, though, was to have the Web server notice this command and
replace the comment with the contents of the file
/web/author-info.txt
. To do that, you have to change the file name of
this URL to have an .shtml extension. Now the server knows that you are
actually programming in an extended version of HTML.
The AOLserver takes this one step further. To the list of standard SHTML
commands, they've added #nstcl
:
<!--#nstcl script="ns_httpget "http://cirrus.sprl.umich.edu/wxnet/fcst/boston.txt" -->
which lets a basically static HTML page use the ns_httpget Tcl API
function to go out on the Internet, from the server, and grab http://cirrus.sprl.umich.edu/wxnet/fcst/boston.txt
before returning the page to the user. The contents of http://cirrus.sprl.umich.edu/wxnet/fcst/boston.txt
are included in place of the comment tag.
This is a great system because a big Web publisher can have its programmers develop a library of custom Tcl functions that its content authors simply call from server-parsed HTML files. That makes it easy to enforce style conventions company-wide. For example,
<!--#nstcl script="webco_captioned_photo samoyed.jpg {This is a Samoyed}" -->
might turn into
until the day that the Webco art director decides that HTML tables would be a better way to present these images. So a programmer redefines the procedure<h3> <img src="samoyed.jpg" alt="This is a Samoyed"> This is a Samoyed </h3>
webco_captioned_photo
, and the next time they are served,
thousands of image references instead turn into
<table> <tr> <td><img src="samoyed.jpg" alt="This is a Samoyed"> <td>This is a Samoyed </tr> </table>
Just like the Tcl function, this Webco HTML function takes two arguments, an image file name and a caption string. And just like the Tcl function, it produces HTML tags that will be recognized by standard browsers. I think it is cleaner than the "include a Tcl function call" .shtml example because the content producers don't have to switch back and forth between HTML syntax and Tcl syntax.<CAPTIONED-PHOTO "samoyed.jpg" "This is a Samoyed">
How far can we go with this? Pretty far. The best of the enriched HTMLs is Meta-HTML (http://www.metahtml.com). Meta-HTML is fundamentally a macro expansion language. We'd define our captioned-photo tag thusly:
<define-tag captioned-photo image-url text> <h3> <img src="<get-var image-url>" alt="<get-var text>"> <br> <get-var text> </h3> </define-tag>
Now that we are using a real programming language, though, we'd probably not stop there. Suppose that Webco has decided that it wants to be on the leading edge as far as image format goes. So it publishes images in three formats: GIF, JPEG, and progressive JPEG. Webco is an old company so every image is available as a GIF but only some are available as JPEG and even fewer as progressive JPEG. Here's what we'd really like captioned-photo to do:
This is straightforward in Meta-HTML:
<define-function captioned-photo stem caption>
;;; If the user-agent is Netscape, try using a JPEG format file
<when <match <get-var env::http_user_agent> "Mozilla">>
;;; this is Netscape
<when <match <get-var env::http_user_agent> "Mozilla/[2345]">>
;;; this is Netscape version 2, 3, 4, or 5(!)
<if <get-file-properties
<get-var mhtml::document-root>/<get-var stem>-prog.jpg>
;;; we found the progressive JPEG in the Unix file system
<set-var file-to-reference = <get-var stem>-prog.jpg>>
</when>
;;; If we haven't defined FILE-TO-REFERENCE yet,
;;; try the simpler JPEG format next.
<when <not <get-var file-to-reference>>>
<if <get-file-properties
<get-var mhtml::document-root>/<get-var stem>.jpg>
<set-var file-to-reference = <get-var stem>.jpg>>
</when>
</when>
;;; If FILE-TO-REFERENCE wasn't defined above, default to GIF file
<when <not <get-var file-to-reference>>>
<set-var file-to-reference <get-var stem>.gif>
</when>
;;; here's the result of this function call, four lines of HTML
<h3>
<img src="<get-var file-to-reference>" alt="<get-var caption>">
<br>
<get-var caption>
</h3>
</define-function>
This example only scratches the surface of Meta-HTML's capabilities. The language includes many of the powerful constructs such as session variables that you find in Netscape's LiveWire system. However, for my taste, Meta-HTML is much cleaner and better implemented than the LiveWire stuff. Universal Access offers a "pro" version of Meta-HTML compiled with the OpenLink ODBC libraries so that it can talk efficiently to any relational database (even from Linux!).
Is the whole world going to adopt this wonderful language? Meta-HTML does seem to have a lot going for it. The language and first implementation were developed by Brian Fox and Henry Minsky, two hard-core MIT computer science grads. Universal Access is giving away their source code (under a standard GNU-type license) for both a stand-alone Meta-HTML Web server and a CGI interpreter that you can use with any Web server. They distribute precompiled binaries for popular computers. They offer support contracts for $500 a year. If you don't like Universal Access support, you can hire the C programmer of your choice to maintain and extend their software. Minsky and Fox have put the language into the public domain. If you don't like any of the Universal Access stuff, you can write your own interpreter for Meta-HTML, using their source code as a model.
http://www.yobaby.com/photo4u/.
-->CAPTIONED-PHOTO
tag throughout your content.
You hire a writer to update some of your pages. He downloads them in
Netscape Navigator, at which time your server converts them into
standard HTML TABLE, TR,
and TD
tags. He edits the document in
Netscape Composer and uses HTTP PUT or FTP to place it back on the
server. At this point, all the CAPTIONED-PHOTO
tags have been lost and with them your insurance against changes in the
HTML standard.
So if someone offers you even a minor variation on HTML, ask him what tools he's developed and how the new language will fit into all of your production processes.
The oldest and most common mechanism for program invocation via the Web is the Common-Gateway Interface (CGI). The CGI standard is an abstraction barrier that dictates what a program should expect from the Web server, for example, user form input, and how the program must return characters to the Web server program for them to eventually be written back to the Web user. If you write a program with the CGI standard in mind, it will work with any Web server program. You can move your site from NCSA HTTPD 1.3 to Netscape Communications 1.1 to AOLserver 2.1 and all of your CGI scripts will still work. You can give your programs away to other webmasters who aren't running the same server program. Of course if you wrote your CGI program in C and compiled it for an HP Unix box, it isn't going to run so great on a Windows NT machine.
Oops.
We've just discovered why most CGI scripts are written in Perl, Tcl, or some other interpreted computer language. The systems administrator can install the Perl or Tcl interpreter once and then Web site developers on that machine can easily run any script that they download from another site.
Fixing a bug in an interpreted CGI script is easy. A message shows up in the error log when a user accesses "http://yourserver.nerdu.edu/bboard/subject-lines.pl". If your Web server document root is at /web (my personal favorite location), then you know to edit the file /web/bboard/subject-lines.pl. After you've found the bug and written the file back to the disk, the next time the page is accessed the new version of the subject-lines Perl script will be interpreted.
For concreteness, let's summarize Unix CGI:
#!/usr/contrib/bin/perl # the first line in a Unix shell script says where to find the # interpreter. If you don't know where perl lives on your system, type # "which perl", "type perl", or "whereis perl" at any shell # and put the result after the #! print "Content-type: text/html\n\n"; # now we have printed a header (plus two newlines) indicating that the # document will be HTML; whatever else we write to standard output will # show up on the user's screen print "<h3>Hello World</h3>";
This example program will print "Hello World" as a level-3 headline. If you want to get more sophisticated, read some on-line tutorials, The Cgi/Perl Cookbook (Patchett & Wright; Wiley 1997), or CGI Programming on the World Wide Web (Gundavaram; O'Reilly, 1996).
It is that easy to write Perl CGI scripts and get server independence, a tight software development cycle, and ease of distribution to other sites. With that in mind, you might ask how many of my thousands of dynamic Web pages use this program invocation mechanism. The answer? One. It was written by Architext and it looks up user query strings in the site's local full-text index. Why don't I have more?
All Web server APIs allow you to specify "If the user makes a request for a URL that starts with /foo/bar/ then run Program X". The really good Web server APIs allow you to request program invocation before or after pages are delivered. For example, you ought to be able to say "When the user makes a request for any HTML file, run Program Y first and don't serve the file if Program Y says it is unhappy". Or "After the user has been served any file from the /car-reviews directory, run Program Z" (presumably Program Z performs some kind of logging).
Sometime in mid-1994 the researchers depending on Martigny, whose load average had soared from 0.2 to 3.5, decided that a 100,000 hit per day Web site was something that might very nicely be hosted elsewhere. It was easy enough to find a neglected HP Unix box, which we called swissnet.ai.mit.edu. And we sort of learned our lesson and did not distribute this new name in the URL but rather aliases: "www-swiss.ai.mit.edu" for research publications of our group (known as "Switzerland" for obscure reasons); "webtravel.org" for my travel stuff; "photo.net" for my photo stuff; "pgp.ai.mit.edu" for Brian's public key server; "samantha.rules-the.net" for fun.
But what were we to do with all the hard-wired links out there to martigny.ai.mit.edu? We left NCSA 1.3 loaded on Martigny but changed the configuration files so that a request for "http://martigny.ai.mit.edu/foo/bar.html" would result in a 302 redirect being returned to the user's browser so that it would instead fetch http://www-swiss.ai.mit.edu/foo/bar.html.
Two years later, in August 1996, someone upgraded Martigny from HP-UX 9 to HP-UX 10. Nobody bothered to install a Web server on the machine. People began to tell me "I searched for you on the Web but your server has been down since last Thursday." Eventually I figured out that the search engines were still sending people to Martigny, a machine that was in no danger of ever responding to a Web request since it no longer ran any program listening to port 80.
Rather than try to dig up a copy of NCSA 1.3, I decided it was time to
get some experience with Apache, the world's most popular Web server. I
couldn't get the 1.2 beta sources to compile. So I said, "This free
software stuff is for the birds; I need the heavy duty iron." I
installed the 80MB Netscape Enterprise Server and sat down with the
frames- and JavaScript-heavy administration server. After 15
minutes, I'd configured the port 80 server to redirect. There was only
one problem: It didn't work.
I spent a day going back and forth with Netscape tech support. "Yes, the Enterprise server definitely could do this. Probably it wasn't configured properly. Could you e-mail us the obj.conf file? Hmmm . . . it appears that your obj.conf file is correctly specifying the redirect. There seems to be a bug in the server program. You can work around this by defining custom error message .html files with Refresh: tags so that users will get popped over to the new server if they are running a Netscape browser."
I pointed out that this would redirect everyone to the swissnet server root, whereas I wanted "/foo/bar.html" on Martigny to redirect to "/foo/bar.html" on Swissnet.
"Oh."
They never got back to me.
I finally installed AOLserver which doesn't have a neat redirect
facility, but I figured that the Tcl API was flexible enough that I
could make the server do what I wanted.
First, I had to tell AOLserver to feed all requests to my Tcl procedure instead of looking around in the file system:
ns_register_proc GET / martigny_redirect
This is a Tcl function call. The function being called is named ns_register_proc. Any function that begins with "ns_" is part of the NaviServer Tcl API (NaviServer was the name of the program before AOL bought NaviSoft in 1995). ns_register_proc takes three arguments: method, URL, and procname. In this case, I'm saying that HTTP GETs for the URL "/" (and below) are to be handled by the Tcl procedure martigny_redirect:
proc martigny_redirect {} { append url_on_swissnet "http://www-swiss.ai.mit.edu" [ns_conn url] ns_returnredirect $url_on_swissnet }
This is a Tcl procedure definition, which has the form "proc
procedure-name arguments body"
. martigny_redirect
takes no
arguments. When martigny_redirect
is
invoked, it first computes the full URL of the corresponding file on
Swissnet. The meat of this computation is a call to the API procedure
"ns_conn" asking for the URL that was part of the request
line.
With the full URL computed, martigny_redirect's second body line calls the API procedure ns_returnredirect. This writes back to the connection a set of 302 redirect headers instructing the browser to rerequest the file, this time from "http://www-swiss.ai.mit.edu".
Here's what I learned from this experience:
# tell AOLserver to watch for PDF file requests under the /ejournal directory # if we don't add additional ns_register_filter commands, all the # other files will be available to everyone ns_register_filter preauth GET /ejournal/*.pdf ejournal_check_auth proc ejournal_check_auth {args why} { # all the parameters we might want to change set user "open" set passwd "sesame" # on the real-life server, these are pulled from a relational database # but here for an example, let's just set it to MIT and Stanford set allowed_ip_ranges [list "18.*" "36.*"] foreach pattern $allowed_ip_ranges { if { [string match $pattern [ns_conn peeraddr]] } { # a paying customer; the file will be sent return "filter_ok" } } # not coming from a special IP address, let's check the # username and password headers that came with the request if { [ns_conn authuser] == $user && [ns_conn authpassword] == $passwd } { # they are an authorized user; the file will be sent return "filter_ok" } # not a good IP address, no headers, hammer them with a 401 demand ns_set put [ns_conn outputheaders] WWW-Authenticate "Basic realm=\"MIT Press:Restricted\"" ns_returnfile 401 text/html "[ns_info pageroot]ejournal/please-subscribe.html" # stop AOLserver from handling the request by returning a special code return "filter_return" }
"For me grad school is fun just like playing Tetris all night is fun. In the morning you realize that it was sort of enjoyable, but it didn't get you anywhere and it left you very very tired."Computer science graduate students earn a monthly stipend that wouldn't hire a good Web/db programmer for an afternoon. If you've been reading Albert Camus lately ("It is a kind of spiritual snobbery to think one can be happy without money") then you'd expect this to lead to occasional depression. For these depressed souls, I published Career Guide for Engineers and Scientists (http://photo.net/philg/careers.html).
-- Michael Booth's comment on my "Women in Computing" page
I thought that starving graduate students forgoing six years of income would be cheered to read the National Science Foundation report that "Median real earnings remained essentially flat for all major non-academic science and engineering occupations from 1979-1989. This trend was not mirrored among the overall work force where median income for all employed persons with a bachelor's degree or higher rose 27.5 percent from 1979-1989 (to a median salary of $28,000)."
I even did custom photography for the page (see
).
But I didn't think I'd really be able to get under the skin of America's
best and brightest young computer scientists until Eve Andersson (the
brilliant Caltech Pi Goddess) and I released Aid to Evaluating Your
Accomplishments (see ).
Here's the source code:
# a helper procedure to pick N items randomly from a list # note that it uses tail-recursion, importing a little bit # of the clean Scheme philosophy into the ugly world of Tcl proc choose_n_random {choices_list n_to_choose chosen_list} { if { $n_to_choose == 0 } { return $chosen_list } else { set chosen_index [randomRange [llength $choices_list]] set new_chosen_list [lappend chosen_list [lindex $choices_list $chosen_index]] set new_n_to_choose [expr $n_to_choose - 1] set new_choices_list [lreplace $choices_list $chosen_index $chosen_index] return [choose_n_random $new_choices_list $new_n_to_choose $new_chosen_list] } } # we encapsulate the printing of an individual person so that # one day we can easily change the design of the page (we display # four people at once and putting this in a procedure keeps us from # having to edit the same code four times). proc one_person {person} { set name [lindex $person 0] set title [lindex $person 1] set achievement [lindex $person 2] return "<h4>$title $name</h4>\n $achievement <br><br> <center> (<a href=\"http://altavista.digital.com/cgi-bin/query?pg=q&what=web&fmt=&q=[ns_urlencode $name]\">more</a>) </center>\n" } # we return HTTP headers to the client ReturnHeaders # we return as much of the page as we can before figuring out which four # people we're going to display; this way if we were going to query a # relational database (potentially taking 1/2 second), the user would # have something on-screen to read ns_write "<html> <head> <title>Aid to Evaluating Your Accomplishments</title> </head> <body bgcolor=#ffffff text=#000000> <h2>Aid to Evaluating Your Accomplishments</h2> part of <a href=\"/philg/careers.html\">Career Guide for Engineers and Scientists</a> <hr> Compare yourself to these four ordinary people who were selected at random: <br> <br> " # each person is name, title, accomplishment(s) set einstein [list "A. Einstein" "Patent Office Clerk" \ "Formulated Theory of Relativity."] set mill [list "John Stuart Mill" "English Youth" \ "Was able to read Greek and Latin at age 3."] set mozart [list "W. A. Mozart" "Viennese Pauper" \ "Composed his first opera, <i>La finta semplice</i>, at the age of 12."] set jesus [list "Jesus of Nazareth" "Judean Carpenter" \ "Told young women he was God and they believed him."] set stevens [list "Wallace Stevens" "Hartford Connecticut Insurance Executive" "Won Pulitzer Prize for Poetry in 1954; best known for \"Thirteen Ways of Looking at a Blackbird\"."] # ... there are a bunch more in the real live script set average_folks [list $einstein $mill $mozart $jesus] # we call our choose_n_random procedure, note that we give it an empty # list to kick off the tail-recursion set four_average_folks [choose_n_random $average_folks 4 [list]] ns_write $conn "<table cellpadding=20> <tr> <td valign=top> [one_person [lindex $four_average_folks 0]] </td> <td valign=top> [one_person [lindex $four_average_folks 1]] </td> </tr> <tr> <td valign=top> [one_person [lindex $four_average_folks 2]] </td> <td valign=top> [one_person [lindex $four_average_folks 3]] </td> </tr> </table> " # note how in the big block of static HTML below, we're forced to # put backslashes in front of the string quotes. This is annoying # and we wouldn't have to do it if we'd implemented this using # AOLserver Dynamic Pages (where the text is HTML by default, # Tcl code by exception). ns_write $conn " <p> Programmed by <a href=\"http://www.ugcs.caltech.edu/~eveander/\">Eve Astrid Andersson</a> and <a href=\"/philg/\">Philip Greenspun</a> in <a href=\"/wtr/servers.html#naviserver\">AOLserver Tcl</a>. If you're a nerd, you might find <a href=\"four-random-people.txt\">the source code</a> useful. <P> Original Inspiration: <cite>How to Make Yourself Miserable</cite>, by Dan Greenburg <hr> <a href=\"/philg/\"><address>philg@mit.edu</address></a> </body> </html> "
The forms user interface model fell into the shade after 1984 when the
Macintosh "user drives" pull-down menu system was introduced. However,
HTML forms as classically conceived work exactly like the good old 3270.
Here's an example that is firmly in the 3270 mold, taken from the Lens
chapter of my photography tutorial textbook (http://photo.net/photo/tutorial/lens.html).
The basic idea is to help people figure out what size lens they will
need to buy or rent in order to make a particular image. They fill in a
form with distance to subject and the height of their subject (see ). The server then tells them what focal length
lens they need for a 35mm camera.
Here's the HTML source for the form:
<form method=post action=focal-length.tcl> How far away is your subject? <input type=text name=distance_in_feet size=7> (in feet) <p> How high is the object you want to fill the frame? <input type=text name=subject_size_in_feet size=7> (in feet) <p> <input type=submit> </form>
Here's the AOLserver Tcl program that processes the user input:
set_form_variables # distance_in_feet, subject_size_in_feet are the args from the form # they are now set in Tcl local variables thanks to the magic # utility function call above # let's do a little IBM mainframe-style error-checking here if { ![info exists distance_in_feet] || [string compare $distance_in_feet ""] == 0 } { ns_return 200 text/plain "Please fill in the \"distance to subject\" field" # stop the execution of this script return } if { ![info exists subject_size_in_feet] || [string compare $subject_size_in_feet ""] == 0 } { ns_return 200 text/plain "Please fill in the \"subject size\" field" # stop the execution of this script return } # we presume that subject is to fill a 1.5 inch long-dimension of a # 35mm negative # ahhh... the joys of arithmetic in Tcl, a quality language so # much cleaner than Lisp set distance_in_inches [expr $distance_in_feet * 12] set subject_size_in_inches [expr $subject_size_in_feet * 12] set magnification [expr 1.5 / $subject_size_in_inches] set lens_focal_length_inches [expr $distance_in_inches / ((1/$magnification) + 1)] set lens_focal_length_mm [expr round($lens_focal_length_inches * 25.4)] # now we return a page to the user, one big string into which we let Tcl # interpolate some variable values ns_return $conn 200 text/html "<html> <head> <title>You need $lens_focal_length_mm mm </title> </head> <body bgcolor=#ffffff text=#000000> <table> <tr> <td> <a href=\"/photo/pcd0952/boston-marathon-46.tcl\"><img HEIGHT=198 WIDTH=132 src=\"/photo/pcd0952/boston-marathon-46.1.jpg\" ALT=\"100th Anniversary Boston Marathon (1996).\"></a> <td> <h2>$lens_focal_length_mm millimeters</h2> will do the job on a Nikon or Canon or similar 35mm camera <P> (according to the <a href=\"/photo/tutorial/lens.html\">photo.net lens tutorial</a> calculator) </tr> </table> <hr> Here are the raw numbers: <ul> <li>distance to your subject: $distance_in_feet feet ($distance_in_inches inches) <li>long dimension of your subject: $subject_size_in_feet feet ($subject_size_in_inches inches) <li>magnification: $magnification <li>lens size required: $lens_focal_length_inches inches ($lens_focal_length_mm mm) </ul> Assumptions: You are using a standard 35mm frame (24x36mm) whose long dimension is about 1.5 inches. You are holding the camera in portrait mode so that your subject is filling the long side of the frame. You are supposed to measure subject distance from the optical midpoint of the lens, which for a normal lens is roughly at the physical midpoint. <P> Source of formula: <a href=\"/photo/dead-trees/professional-photoguide.html\">Kodak Professional Photoguide</a> <br> Source of server-side programming knowledge: Chapter 9 of <a href=\"http://photo.net/wtr/dead-trees/\">How to be a Web Whore Just Like Me</a> <br> Time required to write this program: 15 minutes. <br> Proof that philg is a nerd: <a href=\"focal-length.txt\">view the source code</a> <br> What this is not: a slow Java program that will crash everyone's browser (except those behind corporate firewalls that block all Java applets) <br> Another thing this is not: a CGI program that will make my poor old Unix box fork <br> Yet another thing this is not: a JavaScript program that you'd think would be the right thing but then on the other hand it wouldn't work with some browsers and the last thing that I need is email from confused users <h3>Bored? Try again</h3> <form method=post action=focal-length.tcl> How far away is your subject? <input type=text name=distance_in_feet size=7 value=\"$distance_in_feet\"> (in feet) <p> How high is the object you want to fill the frame? <input type=text name=subject_size_in_feet size=7 value=\"$subject_size_in_feet\"> (in feet) <p> <input type=submit> </form> <h3>European? Macro-oriented?</h3> <form method=post action=focal-length-mm.tcl> How far away is your subject? <input type=text name=distance_in_mm size=7> (in millimeters) <p> How high is the object you want to fill the frame? <input type=text name=subject_size_in_mm size=7> (in millimeters) <p> <input type=submit> </form> <hr> <a href=\"/philg/\"><address>philg@mit.edu</address></a> </body> </html>"
Yes, dead trees.
If you aren't in a refereed journal or conference, you aren't going to get tenure. You can't expect to achieve quality without peer review. And peer review isn't just a positive feedback mechanism to enshrine mediocrity. It keeps uninteresting papers from distracting serious thinkers at important conferences. For example, there was this guy in a physics lab in Switzerland, Tim Berners-Lee. And he wrote a paper about distributing hypertext documents over the Internet. Something he called "the Web". Fortunately for the integrity of academia, this paper was rejected from conferences where people were discussing truly serious hypertext systems.
Anyway, with foresight like this, it is only natural that academics like to throw stones at successful unworthies in the commercial arena. IBM and their mainframe customers provided fat targets for many years. True, IBM research labs had made many fundamental advances in computer science, but it seemed to take at least 10 years for these advances to filter into products. What kinds of losers would sell and buy software technology that was a decade behind the state of the art?
Then Bill Gates came along with technology that was 30 years behind the state of the art. And even more people were buying it. IBM was a faceless impediment to progress but Bill Gates gave bloated monopoly a name, a face, and a smell. And he didn't have a research lab cranking out innovations. And every non-geek friend who opened a newspaper would ask, "If you are such a computer genius, why aren't you rich like this Gates fellow?"
Naturally I maintained a substantial "Why Bill Gates is Richer
than You" section on my site but it didn't come into its own until
the day my friend Brian showed me that the U.S. Census Bureau had put up
a real-time population clock at http://www.census.gov/cgi-bin/popclock.
There had been stock quote servers on the Web almost since Day 1. How
hard could it be to write a program that would reach out into the Web
and grab the Microsoft stock price and the population, then do the math
to come up with what you see
at http://www.webho.com/WealthClock
(see ).
This program was easy to write because the AOLserver Tcl API contains
the ns_httpget
procedure. Having my server grab a page from the Census
Bureau is as easy as
ns_httpget "http://www.census.gov/cgi-bin/popclock"
Tcl the language made life easy because of its built-in regular expression matcher. The Census Bureau and the Security APL stock quote folks did not intend for their pages to be machine-parsable. Yet I don't need a long program to pull the numbers that I want out of a page designed for reading by humans.
Tcl the language made life hard because of its deficient arithmetic. Some computer languages-Pascal, for example-are strongly typed. You have to decide when you write the program whether a variable will be a floating-point number, a complex number, or a string. Lisp is weakly typed. You can write a mathematical algorithm with hundreds of variables and never specify their types. If the input is a bunch of integers, the output will be integers and rational numbers (ratios of integers). If the input is a complex double precision floating-point number, then the output will be complex double precision. The type is determined at run-time. I like to call Tcl "whimsically" typed. The type of a variable is never really determined. It can be a number or a string. It depends on the context. If you are looking for a pattern, "29" is a string. If you are adding it to another number, "29" is a decimal number. But "029" is an octal number so trying to add it to another number results in an error.
Anyway, here is the code. Look at the comments.
# this program copyright 1996, 1997 Philip Greenspun (philg@mit.edu)
# redistribution and reuse permitted under
# the standard GNU license
# this function turns "99 1/8" into "99.125"
proc wealth_RawQuoteToDecimal {raw_quote} {
if { [regexp {(.*) (.*)} $raw_quote match whole fraction] } {
# there was a space
if { [regexp {(.*)/(.*)} $fraction match num denom] } {
# there was a "/"
set extra [expr double($num) / $denom]
return [expr $whole + $extra]
}
# we couldn't parse the fraction
return $whole
} else {
# we couldn't find a space, assume integer
return $raw_quote
}
}
###
# done defining helpers, here's the meat of the page
###
# grab the stock quote and stuff it into QUOTE_HTML
set quote_html [ns_httpget "http://qs.secapl.com/cgi-bin/qs?ticks=MSFT"]
# regexp into the returned page to get the raw_quote out
regexp {Last Traded at</a></td><td align=right><strong>([^A-z]*)</strong>} \
$quote_html match raw_quote
# convert whole number + fraction, e.g., "99 1/8" into decimal,
# e.g., "99.125"
set msft_stock_price [wealth_RawQuoteToDecimal $raw_quote]
set population_html [ns_httpget "http://www.census.gov/cgi-bin/popclock"]
# we have to find the population in the HTML and then split it up
# by taking out the commas
regexp {<H1>[^0-9]*([0-9]+),([0-9]+),([0-9]+).*</H1>} \
$population_html match millions thousands units
# we have to trim the leading zeros because Tcl has such a
# brain damaged model of numbers and thinks "039" is octal
# this is when you kick yourself for not using Common Lisp
set trimmed_millions [string trimleft $millions 0]
set trimmed_thousands [string trimleft $thousands 0]
set trimmed_units [string trimleft $units 0]
# then we add them back together for computation
set population [expr ($trimmed_millions * 1000000) + \
($trimmed_thousands * 1000) + \
$trimmed_units]
# and reassemble them in a string for display
set pretty_population "$millions,$thousands,$units"
# Tcl is NOT Lisp and therefore if the stock price and shares are
# both integers, you get silent overflow (because the result is too
# large to represent in a 32 bit integer) and Bill Gates comes out as a
# pauper (< $1 billion). We hammer the problem by converting to double
# precision floating point right here.
#
# (Were we using Common Lisp, the result of multiplying two big 32-bit
# integers would be a "big num", an integer represented with multiple
# words of memory; Common Lisp programs perform arithmetic correctly.
# The time taken to compute a result may change when you move from a
# 32-bit to a 64-bit computer but the result itself won't change.)
set gates_shares_pre_split [expr double(141159990)]
set gates_shares [expr $gates_shares_pre_split * 2]
set gates_wealth [expr $gates_shares * $msft_stock_price]
set gates_wealth_billions \
[string trim [format "%10.6f" [expr $gates_wealth / 1.0e9]]]
set personal_share [expr $gates_wealth / $population]
set pretty_date [exec /usr/local/bin/date]
# we're done figuring, now let's return a page to the user
ns_return 200 text/html "<html>
<head>
<title>Bill Gates Personal Wealth Clock</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h2>Bill Gates Personal Wealth Clock</h2>
just a small portion of
<a href=\"http://www-swiss.ai.mit.edu/philg/humor/bill-gates.html\">Why Bill Gates is Richer than You
</a>
by
<a href=\"http://www-swiss.ai.mit.edu/philg/\">Philip Greenspun</a>
<hr>
<center>
<br>
<br>
<table>
<tr><th colspan=2 align=center>$pretty_date</th></tr>
<tr><td>Microsoft Stock Price:
<td align=right> \$$msft_stock_price
<tr><td>Bill Gates's Wealth:
<td align=right> \$$gates_wealth_billions billion
<tr><td>U.S. Population:
<td align=right> $pretty_population
<tr><td><font size=+1><b>Your Personal Contribution:</b></font>
<td align=right> <font size=+1><b>\$$personal_share</font></b>
</table>
<p>
<blockquote>
\"If you want to know what God thinks about money, just look at the
people He gives it to.\" <br> -- Old Irish Saying
</blockquote>
</center>
<hr>
<a href=\"http://photo.net/philg/\"><address>philg@mit.edu</address>
</a>
</body>
</html>
"
So is this the real code that sits behind http://www.webho.com/WealthClock?
Actually, no. You'll find the real source code linked from the above URL.
Why the differences? I was concerned that, if it became popular, the Wealth Clock might impose an unreasonable load on the subsidiary sites. It seemed like bad netiquette for me to write a program that would hammer the Census Bureau and Security APL several times a second for the same data. It also seemed to me that users shouldn't have to wait for the two subsidiary pages to be fetched if they didn't need up-to-the-minute data.
So I wrote a general purpose caching facility that can cache the results of any Tcl function call as a Tcl global variable. This means that the result is stored in the AOLserver's virtual memory space and can be accessed much faster even than a static file. Users who want a real-time answer can demand one with an extra mouse click. The calculation performed for them then updates the cache for casual users.
Does this sound like overengineering? It didn't seem that way when Netscape put the Wealth Clock on their What's New page for two weeks (summer 1996). The URL was getting two hits per second. Per second. And all of those users got an instant response. The extra load on my Web server was not noticeable. Meanwhile, all the other sites on Netscape's list were unusably slow. Popularity had killed them.
Here are the lessons that I learned from this example:
ns_httpget
call.
See for the WimpyPoint page that offers public
presentations to casual surfers. The idea is that someone will come to
the site, look for the name of the author, then click down to find the
presentation of interest.
Here's the ADP source code:
<% wimpy_header "Choose Author" %> <h2>Choose an Author</h2> in <a href="/"><%=[wimpy_system_name]%></a> <hr> Here's a list of users who have public presentations: <ul> <% set db [ns_db gethandle] set selection [ns_db select $db "select distinct u.user_id, u.last_name, u.first_names, u.email from wimpy_users u, wimpy_presentation_ownership wpo, wimpy_presentations wp where u.user_id = wpo.user_id and wpo.presentation_id = wp.presentation_id and wp.public_p = 't' order by upper(u.last_name), upper(u.first_names)"] while { [ns_db getrow $db $selection] } { set_variables_after_query ns_puts "<li><a href=\"user-top.adp?user_id=$user_id\">$last_name, $first_names ($email)</a>\n" } %> </ul> Or you can do a full-text search through all the slides: <form method=GET action="search.adp"> Query String: <input type=text name=query_string size=50> <input type=submit value="Submit"> </form> <% wimpy_footer %>Note that I'm allowed to use arbitrary HTML, including string quotes, at the top level of the file. Note further that there are two escapes to the ADP evaluator. The basic escape is
<%
, which will
execute a bunch of Tcl code for effect. If the Tcl code wants to write
some bytes to the browser, it has to call ns_puts
. The
second escape sequence is <%=
, which will execute a
bunch of Tcl code and then write the result out to the browser.
Generally I use the <%=
style when I want to do
something simple, e.g., include the system name that I grab from the Tcl
procedure wimpy_system_name
. I use the <%
style when I want to execute a sequence of Tcl procedures to query the
database, etc.
Anyway, thanks to Microsoft's sloppiness, in just a couple of hours of surfing one night in July 1998, I managed to accumulate a nice collection of ASP examples at http://arsdigita.com/books/panda/aspharvest/. Note that I did my surfing some time after the bug had become common knowledge yet companies such as DIGITAL, Arthur Andersen, and banks had not patched their servers.
I find firewall.asp amusing because it is DIGITAL's advertisement for their network security products. Similarly I like the fact that GAP Instrument Corp. took the trouble to warn users
You have reached a computer system providing United States government information. Unauthorized access is prohibited by Public Law 99-474, (The Computer Fraud and Abuse Act of 1986) and can result in administrative, disciplinary or criminal proceedings.(the very first link from http://net.gap.net and all the other pages on their Web sites) yet had left their ASP pages wide open.
CompuServe gives us a nice simple example with Conf.asp. The goal of the
script is to first figure out whether the person browsing is a
CompuServe member or not and then serve one of two entirely separate
HTML pages. An if statement is thus opened inside one <%
%>
and closed in another:
An interesting thing to note about this page is that CompuServe hasn't run their HTML through a syntax checker, which would no doubt have complained about the stuff after the<!--#INCLUDE VIRTUAL="/Forums/member.inc"--> <% if member = 1 then %> <HTML> <HEAD> <TITLE>TW Crime Forum</TITLE> </HEAD> <BODY BGCOLOR=#FFFFFF> ... ** a page for members *** .. </BODY> </HTML> <BR><I>We Update the Forum Directory Weekly. The directory was last updated: Thursday, January 08, 1998</I> ... </BODY> </HTML> <% else %> <HTML> <HEAD> <TITLE>TW Crime Forum</TITLE> </HEAD> <BODY BGCOLOR=#FFFFFF> ... ** a page for non-members *** </BODY> </HTML> <%End If%>
</HTML>
(I've
highlighted the extraneous text in bold, above).
Let's move on to some db-backed pages.
The folks who built Fulton Bank's site (www.fulton.com) are very enthusiastic about Microsoft:
"The hottest technology to hit the Internet which is actually useable now is Active Server Page scripting. This has given us a number of advantages over the ancient art of CGI. ... Intranets and Extranets where the variety of user machine platforms, processors, etc are an issue ASP can play in nicely."Let's see how ASP works for them in process_product.asp, a script that takes a query string and tries to find banking products that match this query string.
-- http://coolnew.xspot.com/what_we_use.asp
This is some pretty clean code. The programmers have encapsulated the database password in their ODBC connection configuration. Also, rather than just bury the magic number "1057" in the code, they set<% affcode = 1057 %> <HTML> <HEAD> <TITLE>Fulton Bank</TITLE> </HEAD> <BODY BGCOLOR="#FFFFFF"> <BLOCKQUOTE> <TABLE WIDTH=370 ALIGN="middle"> <TR> <TD> <BR> <IMG SRC="images/header_products.gif"><BR> <BR> <BR> <% Set Conn=Server.CreateObject("ADODB.Connection") Conn.Open "FultonAffiliates" SQL = "SELECT * FROM products WHERE productname LIKE '%" & Request.Form("product") & "%' AND affiliate = '" & affcode & "'" Set RS = Conn.Execute(SQL) %> <TABLE> <% if RS.EOF then %> <TR><TD>Sorry No Products Found</TD></TR> <% end if %> <% DO UNTIL RS.EOF %> <TR> <TD VALIGN="top"><IMG SRC="images/diamond3.gif"></TD> <TD> <A HREF="<% = RS("url") %>"><FONT COLOR="blue"><% = RS("productname") %></FONT></A><BR> <% = RS("shortdesc") %><BR> <BR> <BR> </TD> </TR> <% RS.MoveNext %> <% LOOP %> </TABLE></BLOCKQUOTE> </TD> </TR> </TABLE> <% rs.close conn.close %> <!--#include file="footer.asp"--> </BODY> </HTML>
affcode
to it as the very first line of the program. Finally, they've parked
the page footer in a centralized footer.asp file that gets included by
all of their scripts.
What you should have learned from this section is that, if you're going to use Microsoft server tools, you shouldn't take any programming shortcuts, leave the database or Administrator password in the code, or put any naughty words into comments. When the next NT/IIS/ASP bug is discovered and your source code becomes public, you want people to admire your work!
If you're thinking that ASP sounds like a better-than-average idea from Microsoft, you won't be surprised to learn that it wasn't their idea. They dipped into some of their desktop monopoly profits to acquire the small company that developed ASP. As I wrote this, I tried to surf over to http://www.microsoft.com/iis/ to see if they credit the programmers who developed ASP, but the Microsoft server farm was taking 45 seconds to deliver each page. So I gave up.
Please take some time to investigate and properly flesh out this section with some information about Java server side programming. I find the Servlet API an absolute MUST in my work. When combined with a top-notch JVM (a la IBM), on a proper foundation such as Linux with the Apache web server and servlet engine, it has proven to bring me completely out of the dark ages of thick client GUI programming.The network centric world is here and I would venture that one could not find a more network savvy programming language. There are careful choices to be made upon entering the Java arena, but they are easy, obvious choices and the benefits of making the committment are simply joyous.
-- Mitch Winkle, November 3, 1999
You can avoid having to escape double-quote marks with backslashes by using single-quote marks instead. This is legal HTML as documented in a page at the W3C titled On SGML and HTML where it says:
By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa.I have used this technique many times over the past five years or so and I have never seen a failure. Pages done this way also pass the tests at validator.w3.org so I am pretty comfortable with it.
-- Peter Holt Hoffman, December 23, 1999
One of the most popular server-side languages is PHP, with over one million web servers supporting it as of November 1999. Their home page is at http://www.php.net
-- Marc Delisle, February 18, 2000
Please, please, please don't recommend The CGI/Perl Cookbook or anything else that might even allege that the code in Matt's Script Archive might even possibly be the right way to do things.I worked for a medium-sized ISP for a number of years. Part of my job there was providing support for users who were attempting to integrate dynamic content with their personal websites. Most of them were using pre-packaged CGI, much of which was by either Matt Wright or Selena Sol. These scripts were a support nightmare, and in some cases (Matt's search.pl) brougt out poor webserver to its knees. Some of the more interesting bits include calling 'grep' from within a Perl script (WTF??), parsing GET/POST data manually (instead of using the CGI library that's been included with the standard Perl distribution for a couple of years now), and doing keyword searches "The Hard Way" (ie. opening up a file in the web directory, slupring the whole thing into memory, applying regexps to it, and moving on to the next file. This for each and every query!)
Note that we provided our users with a competent, indexing full-text search engine, and even went so far as to write our own when that one wouldn't scale. Still, our users persisted in installing this evil onto the webserver. For a while, every few weeks, we were hunting-and-killing instances of Matt's search.pl installed by our users. It was awful.
For a similar persepctive, just check out what happens when anyone mentions the name "Matt Wright" anywhere in comp.lang.perl.* Eek.
Now, I have to admit that I haven't read The CGI/Perl Cookbook. It's possible that Mr. Wright has learned a bit about what good code looks like, but if that's the case, the contents of Matt's Script Archive don't seem to reflect it. It was really nice of him to try to share his knowlege and code with the newbies of the world, but it's possible that he's inadvertently caused more harm than good.
For a beginning Perl programmer, Learning Perl, from O'Reilly & Assoc. (the Llama Book) is probably still the best thing out there. I haven't read any of the CGI specific Perl books, but there has to be something better.
This is such a wonderful, informative book... Please don't lead you readers astray! :)
-- Ian Baker, March 15, 2000
I've been using JavaServer Pages ( JSP ) for a while now and think that they definitely merit consideration.
-- George Harley, March 16, 2000
As an addendum to Example 7... a new Microsoft bug takes over where the other two left off. You can view the source of ASP pages on some servers again.> SECURITY LEAD STORY: > IS WEBHITS.DLL REVEALING YOUR SOURCE? > > Imagine the URL of a typical ASP site: > http://www.yoursite.com/yourfile.asp > > Now try this variation: > http://www.yoursite.com/null.htw?CiWebHitsFile=/yourfile.asp%20&CiRestriction=none&CiHiliteType=Full > > If you see your source code, you have the webhits.dll bug! > > Microsoft's Fix is at: > http://www.microsoft.com/technet/security/bulletin/ms00-006.aspYou can use that information to learn ASP by example, trash Microsoft products, help a friend who's server is on NT, or view funny and/or disparaging comments by sloppy consultants. Good luck.
-- Rob Duarte, April 3, 2000
The string handling capabilities of Perl are much superior to Tcl. Tcl doesn't even have the concept of a "here document".It is silly to compare the time it takes to start an application like MS Word on a PC with the time it takes a UNIX system to fork and run a CGI script. The time it takes for a UNIX system to fork and run a CGI script, is soo small that it is not perceivable by a human being. Every time you type "ls" on the command line on a UNIX system, it forks the shell and execs the "ls" program. UNIX systems are optimized for doing this and running several different interpreter programs like Perl simultaneously.
I am tired of hearing this argument against CGI programming. UNIX systems start new programs very quickly. CGI programs are not as large and slow to start as PC applications. It's not a good reason not to do CGI programming.
In addition most people who use this argument against CGI also provide CGI interfaces to run their products. Meta-HTML uses CGI to interface to web servers.
Having to open a new database connection on every hit is a good reason not to do CGI programming.
Also, with traditional CGI programming you are embedding HTML in a program, which is bad because then you have programmers doing design (HTML) work. Most programmers have no clue about graphic arts. You have to use some method like JSP, ASP, or templates so that you can have a real graphic artist do the design work and then your programmers can put the code in the HTML, because that is something that a programmer can handle. A graphic artist is not likely to be able to understand how to put HTML into a program. Having a programmer do the graphic art, design work leads to really bland, visually boring sites like this one.
-- Bill Chatfield, May 3, 2000
Peter Holt Hoffman wrote:
You can avoid having to escape double-quote marks with backslashes by using single-quote marks instead ...Not quite true. In some cases you *have* to use double-quotes. Consider the following piece of code:
<input type='hidden' name='Last_Name' value='<%=$Last_Name%>'>
Resulting HTML is perfectly valid as long as Last_Name variable contains string like "foobar". As soon as you put "D'Andrea" to Last_Name variable, you get:
<input type='hidden' name='Last_Name' value='D'Andrea'>
This becomes a problem that can be simply resolved by using double-quotes for the value.
<input type='hidden' name='Last_Name' value="'<%=$Last_Name%>">
Again, you have to be careful if the string contains double-quote character.
-- Nemanja Stanarevic, July 25, 2000
You'd be better off saying this:<input type='hidden' name='Last_Name' value='<%=[ns_quotehtml $Last_Name]%>'>Then you don't have to worry about single or double quotes in $Last_Name, and you can use single quotes in your HTML.
-- Rob Mayoff, July 25, 2000
In step 4, example 1 you talk about 302 redirects. These are, according to RFC 2616 (HTTP/1.1), only temporary redirections. Wouldn't it have been better to use a 301 redirection, which is "permanent"?
Unless I'm mistaken, search engines will update their links when they encounter a 301 redirect, whereas 302 redirects do not result in such an update. Does anyone know this more accurately?
-- Tomi Junnila, September 19, 2000
Hi, Phil, I'm here at ArsDigita Bootcamp, and the exercise I am doing right now is based on the code on this page for Bill Gates Wealth Clock. In order for this code to work now, because of changes on your server and on the population server you reference, the regexp on line 39 should call for <H2>s instead of <H1>s, and the pretty_date code should come out. Just so each person doesn't have to debug it when they do the exercise. Thanks for everything!
-- Sunah Cherwin, January 16, 2001
" ... my unix box doesn't like to fork 500,000 times a day ..."That's a fork every .172 seconds. A 166mhz pc forks slightly under 1,000 times/second, or every .001 seconds.
-- Evan Schaffer, March 6, 2001
Well in short: THERE ARE SOME UNTRUE STATEMENTS ABOUT ASP.
-- Aurelian POPA, May 13, 2001
Doesn't Meta-HTML resemble Lisp?
-- Andrei Popov, June 15, 2001
Yes, it should be 301 Redirect ("moved permanently"), not 302 ("moved temporarily").Search engines do follow this convention. On my site, I set up a 302 redirect to somebody else's site from a maked-up URL that it never occupied, just to illustrate how redirect works. I also listed this maked-up URL as hyperlink on one of my pages. After a while, a search for the title of that site on Google would return my URL as the first hit, accompanied with title and content from the site that my URL was redirecting to. There was no separate search result for site's own URL.
I changed redirect code from 302 to 301. Google apparently re-visited my URL several weeks later, and now a search for the title of that site turns up URL of that site, as it should be. My redirected URL is no longer visible anywhere in the search results.
I'm not sure if you can "hijack" the top listing from somebody this way. It's likely that the site I wrote about was not indexed by Google before (it had a brand new domain name) and was found first through my redirect.
-- Vadim Makarov, August 10, 2001
This page is slanted toward publishing sites, and it's really showing its age to boot.
Back in early 1997 my colleagues and I were looking for something to replace Perl CGIs for publishing/community sites (kinda like photo.net), doing some of the same analysis that Phil has done here. Perl is a fun language and quite capable at system scripting and complex text processing, and I still use it today for various things. However, it's really hard to maintain a medium (>5K lines of Perl code) or large code base written in Perl, and CGI is dreadfully slow. The clever Apache module "mod_perl" that keeps Perl in-process (avoiding forking) either wasn't available or didn't work, FastCGI broke too often, and so we looked at other languages. ASP was very easy to learn but it was Microsoft-only (at the time) and VBScript is a really weak language even for scripting languages. We used Visual J++ to create COM objects written in Java which did the heavy lifting and used JScript in ASP to do the page stuff. That worked OK except for the usual NT/IIS stability problems. We used that for a few sites.
We also evaluated Netscape Livewire, which is compiled JavaScript in HTML pages, with some built-in objects written in C such as database connectors with connection pooling, local filesystem utility objects, an SMTP mail connector, etc. It was horrible - it was fast, but extremely buggy, both at a page programming level (stuff crashed or didn't work) and at a server level (configuration changes didn't stick or didn't work, the server would just hang on restart sometimes).
We didn't try AOLServer because we decided TCL was too weak of a scripting language and had doubts about the product's future. Perhaps we should have tried it but we didn't need to because...
We tried Java Servlets very early on and found that they worked, although the runtime "servlet containers" were extremely immature, and database drivers were hard to find. Compilation wasn't a big deal in those days but it was definitely slower than edit-save-reload. On the other hand performance, and more importantly scalability, was fantastic compared to CGI, since servlet containers are multithreaded and since we wrote a simple database connection pool early on. We were also starting to build sites that had light e-commerce functionality so we needed a language that would allow us to write fairly complicated code without it getting out of hand.
Java really paid off in this respect, in a way that easy scripting languages won't. Even if a scripting language has the ability to let you talk to components/objects written in a "real programming language", that can be a major pain in the rear for the "real programmer" if the scripting environment doesn't handle the data type mapping between the two languages. With Java this was not an issue but we did still have to deal with the HTML-in-source-code issue. I dealt with this the same way I did with Perl - by implementing a trivial template system which used fake tag substitution. This worked OK but restarting the servlet container to show code changes was still a problem (templates reloaded automatically). After far too long and too many proprietary competiting technologies got a foothold, Sun released the Java Server Pages specification, based on ASP. In the JSP architecture there was still a compilation step but you didn't see it because the JSP/servlet container did it for you the first time you reloaded the page after changing things. You still have to restart the servlet container to see changes but at least the templates are standardized and it's somebody else's problem to code and debug the template system.
Maybe this all could have been done in AOLServer but we never went down that road because Java worked well for us. Unlike all the other stuff we had tried, Java worked the way we expected it to, and didn't bog down terribly crash when we wrote a load-testing tool and aimed it at our web sites. So we stopped looking.
I've been working on a small, silly web site project that is mainly an application (as opposed to a document) but isn't terribly complex, and which talks to a database, and I've been using PHP to prototype it. The site has a WAP interface (it makes sense, I promise) and figuring out WML using a scripting language has made it a much less painful experience as I've struggled with getting the WML code just right so that the minibrowser won't barf. I'd hate to think what it would have been like with a Real Programming Language, although JSP wouldn't have been too bad since there's no (visible) compile cycle involved. It took me a few hours to figure out the right command-line incantation to get Apache and PHP to compile correctly and to link to the native Oracle client library, but that's part of the joy of using open-souce C programs: you have to read the documentation and fiddle a bit until you get it right. Still, there are a lot of useful text and HTML functions; the database access is pretty snappy, and this may be the appropriate heir to the niche that AOLServer flourishes in. I'll probably do 50% of the site in PHP and then write the tough parts in Java, then decide whether I should replace the PHP stuff with Java or just leave it as a hybrid.
One other thing about Java, which applies to some other languages but not to most scripting languages: its error handling (via a language feature called Exceptions) is fantastic. This is one place where most scripting languages fall down, although I should note that some of them have grafted it on as an afterthought. C doesn't even have exceptions, although C++ does. Exceptions are basically a way to signal an error condition by stopping execution of the current block of code and exiting with a value that isn't necessarily of the same datatype as the expected return value, but is an object that may contain info about the error. Java has had exceptions from the beginning, and the Java class library uses exceptions all over the place, so it's part of the zen of Java that you use exceptions. Perl has exceptions but I've seen a lot of Perl code (from CPAN, from various commercial software vendors, etc.) and I've never seen them used. If you lack exceptions then you have to write ugly functions that are actually procedures which may return an error code, with "return arguments" (some of the things you supply as parameters are actually placeholders for return values). That means you end up with calling code that looks like:
err = do_stuff(a, &b); // a is input, b is output if (NO_ERROR == err) { // did do_stuff work? err = do_more_stuff(b,&c)); // b is input, c is output if (NO_ERROR == err) { printf("yay, c is %s", c); } else { printf("do_more_stuff failed with an error code of %d", err); } } else { printf("do_stuff failed with an error code of %d", err); }which is a royal pain, so lazy programmers tend to just skip thorough error checking in code and let the QA people find the error conditions. With exceptions the above code can look like:try { do_more_stuff(do_stuff(a)); } catch (int err) { printf("there was an error: %d", err"); }which is a lot cleaner IMHO. Apply this to a mission-critical app that handles money, order data, etc. (in which all errors must be caught and handled appropriately) that is tens or hundreds of thousands of lines long, and you can see why exceptions are important.As for Phil's assertions about application servers, I disagree. Application servers have some very advanced functionality that makes sense for very complex back-end applications. A publishing/community site like photo.net doesn't need that stuff; nor does slashdot.org or f**kedcompany.com. That's why these sites just use a scripting language and an RDBMS. Compare that to E*Trade or Orbitz, which are basically big complicated back end systems with a web UI. In these cases the business logic is very complex, and transactions may need to take place across a half dozen systems to process a request. That's why these systems use Java and an application server. However, chances are, almost nobody reading this is building something that big, so chances are you don't need an application server. A JSP/Servlet container is probably fine, and there are several excellent free open source ones (Resin and Tomcat come to mind). Database connection pooling is a must, and either your database driver should include it (if it's modern enough) or you can steal one or code it up yourself in a day.
I also recommend if you're dealing with a lot of forms and complicated DB tables that you look into TopLink. If you're building a complex app, chances are you're not just shuffling strings from forms to SQL statements and then back from query results into HTML. You probably have objects that represent real-world entities in your application (user, customer, product, payment, rating, comment, shipment, etc.) and those don't map exactly to your data model because objects and tables are inherently different representations of the entity. TopLink is a very slick tool/library combo that lets you declaratively define these mappings in a GUI, but it also caches objects (reducing DB access), allows you to write queries for objects and their properties rather than writing SQL select statements, and allows object transactions that can affect multiple databases. It extends its in-memory object transactions and locking down into underlying DB's locking mechanism, meaning that not everything touching your database has to use TopLink and things will still work the way you expect. We were looking for a simple object-relational mapper and were blown away by how sophisticated it is. It is commercial; I think it was $5K per developer seat but there was no charge for the runtime library. It was acquired by WebGain and their online store doesn't have a price so I don't know how much it costs now.
One last note about application servers - from what I've read, nobody likes EJB entity beans, but this doesn't mean J2EE is crap, or that EJB session beans are crap. It just means that EJB entity beans are probably a really dumb way of loading and storing your persistent objects in an RDBMS. Either roll your own (objects that have a "save me" / "load me" method, or a "factory" class that knows how to do queries by object property and return a bunch of matching objects, encapsulating the SQL inside itself), or check out TopLink.
By the way, this is such a long comment that my session timed out, and the first time I tried to submit it, I got a response page containing some SQL and an Oracle error describing a parent key violation because my userid was zero. Oops. This is why catching errors is important... even on community sites. :)
-- Jamie Flournoy, August 21, 2001