Docs » WebApp::Documentation

WebApp Reference Manual

TABLE OF CONTENTS

NAME

WebApp::Documentation - WebApp Reference Manual

VERSION

  Time-stamp: <2006-09-06 18:46:25 mailto:attila@stalphonsos.com>
  $Id: Documentation.pm,v 1.17 2006/07/13 19:13:40 attila Exp $

SYNOPSIS

  $ perldoc WebApp::Documentation

DESCRIPTION

Reference Manual. In a state of flux.

INTRODUCTION

This is the WebApp reference manual. It is meant to serve as the definitive reference guide for developers using WebApp. Although it does contain some background material, more thorough treatments of the rationale behind WebApp and other subjects is available in other documents, to which the interested reader should refer:

Major Features

In addition to separating HTML and code differently than some other web application frameworks, WebApp also does a few other things differently than you may have seen before.

Installation and Maintenance

WebApp is designed to be installed many times on a single system, e.g. a Unix web server running Apache. There is a base install from which each individual web application's copy of WebApp is derived, but each one can evolve on its own, and even keep its own (slightly modified) variant of the base WebApp framework code around, without confusing anything. This is might not sound like a good thing, until you consider the problem of keeping a heavily used, stable web application up and running, even when a new version of WebApp comes out; you can take a gradual approach to upgrading, if you have particularly convoluted or troublesome apps in your system, without keeping everything else from moving forward. Hacks needed to patch up a problem in one app do not affect any other apps on the same server.

CGI and SpeedyCGI

WebApp explicitly supports SpeedyCGI for high-performance, lower latency requirements. Many Perl-based Web environments are designed to work with mod_perl, but this is an approach I disagree with. It's a bad idea to stuff anything as complex and hairy as a Perl interpreter into your web server. Apache is a beautifully designed HTTP-based I/O multiplexer. It does this job very well. Something like mod_perl can only have a negative impact on stability and resource usage in the long run.

The SpeedyCGI approach is different. It works by creating a separate pool of processes which run persistent Perl interpreters of various kinds (e.g. with different things loaded into them). Communication between those Perl interperters and the web server can be accomplished by an optional Apache module, which is considerably smaller and simpler than mod_perl: it just finds a suitable persistent Perl back-end process and arranges to redirect I/O to it for this web transaction. This has the same beneficial effects that mod_perl does, e.g. amortizing start-up overhead over many web transactions, reusing database connections, etc. However, by separating the Perl interpreter from the Web server, we make the overall system easier to maintain and tune. Apache does what it is good at, and Perl does what it is good at.

You can also use SpeedyCGI without the Apache module, if it isn't convenient to install or configure it. You still get most of the benefits, but it's simpler administratively. Check out http://daemoninc.com/SpeedyCGI/ for more information on SpeedyCGI itself.

WebApp is designed to work with SpeedyCGI, or in a plain old CGI environment. It will not currently work with mod_perl, for several reasons that are relatively difficult to fix.

Database-Aware

WebApp is database-aware, and assumes that your application needs to work with a relational database, such as PostgreSQL or MySQL (although it is possible to use WebApp without one). It is designed to populate your database with certain basic tables for keeping track of e.g. users, sessions, etc., and your code and database schema extend and live on top of these basic structures. The built-in schema are flexible enough for you to build whatever you need on top of them, e.g. per-user preference schemes, register-by-email automated sign-up module, etc.

Built-in Objects and Functions

The built-in schema are also implemented as perl Objects, which are always available to a specific webapp's code. In general, no code need be written to deal with sessions, cookies, users, passwords, user registration, etc. The goal is for the framework to provide built-in implementations of commonly needed objects and functions, not just a the raw primitives needed to build them up. This might mean that it's harder to implement a completely different way of doing this or that, but functionality and ease of use always trump generality in WebApp.

Sanitization

WebApp takes a rather strong view on input sanitization, a key problem area in webapps (see the GLOSSARY entry on Cross-site Scripting for more information). The framework is very careful to provide application code with sanitized inputs from the user. The underlying primitives used to do the sanitization are exposed conveniently, in any case, and fairly natural conventions are used by default to sanitize output. Of course, nothing really understands the semantics of user inputs quite like the application logic in your code, so WebApp's framework is no excuse for not validating user inputs above and beyond what WebApp does mechanically; always ask yourself the musical question: Does this make sense?

WebApp also does output sanitization by default, and the way that it does this can be controlled via the standard WebApp configuration file for your particular webapp. You can override both input and output sanitization if the need arises.

TMTOWTDI Considered Mildly Dangerous

The approach I have taken to designing and implementing WebApp has been to stay as practical as I possibly could. The whole perl TMTOWTDI philosophy, although appealing at times, can also be a serious liability. In WebApp, as long as there is a reasonable way to accomplish a task, I do not go out of my way to provide a second, or a third. The notation used in the HTML templating language is just barely enough to be implemented with regular expressions; it is not a general-purpose programming language, and forces you to do things in a certain way. I do not consider this a flaw, I consider it a feature. I find it easier to explain how to use something if there aren't fifteen options for expressing a relatively simple idea, and I intend that WebApp be simple to explain.

STRUCTURE OF A WEBAPP

In WepApp, an application has two parts: web content and code. The web content can be HTML, JavaScript, images, and so on. Standard stuff. All of this can be split up and stored in as many files as the designer likes. WebApp can stitch it all together in arbitrary ways, so you can split off a commonly-used HTML component into a separate file if you like, or not. Some web design environments do this sort of thing behind the designer's back, anyway. My web design environment, of course, is Emacs, so I tend to have a stilted view of the world; if WebApp does something that doesn't jive well with the web design environment you use, I'd love to hear about it.

For the rest of this document, I'll use the term "HTML" to mean any and all of the non-code portions of a webapp. I'll use "webapp" as a shorthand for "web application," as opposed to the StudlyCapped "WebApp," which is the proper name of the framework that this document describes. I tend to be fairly Apache-centric in my descriptions, but any web server under which at least the standard CPAN CGI perl module works will be sufficient for WebApp's purposes. I am most definitely Unix-centric, but, again, I think Apache trumps Unix, and almost everything I have to say here applies to e.g. Apache under Cygwin under Windows. YMMV. IANAL.

The natural way to get a webapp working with WebApp is to install the HTML some place where the web server can get at it, and the code somewhere that it can't, at least not directly. This is so that a mistake or misconfiguration on the part of the web server administrator doesn't result in your source code being served up to random web browsers. We call the place where the HTML goes the "html directory" or htmldir, and the place where the code goes the "code directory" or codedir. Each webapp can also have its own configuration file, which can be used to control quite a bit of how WebApp works, as well as for any arbitrary global configuration stuff that your particular webapp needs.

Along with your own application code, the WebApp framework is required to get all this stuff to work. Now, the normal perly way to do this is to have your framework or what have you get installed into the system's standard @INC path, as a perl module or set of modules, or at least some sort of reusable package. We do this with WebApp, but only halfway. The perl modules that are installed in @INC under WebApp are (for the most part) not meant to be used directly. Instead, they should be installed into the codedir of each individual webapp. The webapp driver CGI program ensures that it is this copy of the WebApp framework, and not the one installed in the standard system @INC dirs, that is used by the individual webapp. This has several important benefits, which we'll get to in due time.

All of this magic is accomplished by means of some special tokens in the WebApp .pm files, and a standalone program called "webapp", which is meant to be invoked on the command line on the machine where the Apache server lives (or at least on a machine that can write into some filesystem that Apache can see). The standard way to manage this process is via a Unix makefile in the main source code directory for the webapp (NOT the codedir where files are installed for production).

The template application that comes with WebApp contains an extensive build system that works with both major variants of the Unix make command (BSD and GNU, respectively). It is based on ExtUtils::MakeMaker, and is a really good place to start. See the webapp POD for an example of how to get a clone of the template application set up for your hacking pleasure.

So, supposing we want to install a webapp under our home directory on the web server (let's just assume this is possible, which I sincerely hope it is not, unless you are the only user of the server machine). The Apache administrator has set it up so that ~/public_html is our window to the WWW: assuming our user id on the machine is "fred", and the name of the web server in DNS is "bedrock.com", then someone who points their browser at http://bedrock.com/~fred/foo.cgi will be accessing the file foo.cgi in ~fred/public_html on that machine.

Apache Setup

The Apache administrator must have also done a couple other things, which are fairly standard. In fact, this is all probably already done in your Apache setup already.

Enable CGI Execution in Relevent Dirs

Something like the following must be in some Apache config file, assuming we're using the more-or-less standard ~/public_html convention.

  <Directory /home/*/public_html>
    AllowOverride FileInfo AuthConfig Limit
    Options MultiViews Indexes SymLinksIfOwnerMatch ExecCGI
    ...other Directory settings...
  </Directory>

This will make CGI scripts executable in all ~/public_html HTML document trees.

Enable file.cgi -> CGI handler mapping

Apache has to be told that files which end in .cgi are CGI scripts.

  AddHandler cgi-script .cgi

Add index.cgi to the List of Directory Access Indices

Apache has a list of files it will look for if asked to display a directory.

  <IfModule mod_dir.c>
    DirectoryIndex index.cgi index.html
  </IfModule>

Example 1: Rewrite Me

Insert example here.

...

So, we have some templatized HTML, some data, and some code. These are all the essential ingredients for a webapp. The above assumes that we are going to call the webapp poem.cgi, because the main entry-point is a sub called poem. This is an example of the simplistic way that WebApp works: no fancy registry, relatively little abstraction, not more than one way to do it. YMMV. YDD.

Now, I'll lay out all the things that must happen for http://bedrock.com/~fred/poem.cgi to become a living, breathing source of inspiration for the whole planet, but many of these steps can be automated. Nonetheless, it is instructive to know how it all works. Once we've got the basic thing working, we'll improve it in various and increasingly complex ways.

First, the default template HTML for any webapp is always called _component.html, and lives in the htmldir of the installed webapp. In general, the default .htaccess file that is put in the htmldir denies direct access to files that start with an underscore, or end with a .pl extension, so it's common to name HTML template files with a leading underscore.

Now, the way that _component.html, or any other HTML template in a webapp is ever processed in the first place is via a CGI program that calls WebApp::Component::Main(). WebApp comes with a standard CGI program (or mod_perl program, take your pick), called component.cgi. In fact, under almost all circumstances, it's the only real CGI program that you ever need to have with WebApp. This has led to the standard WebApp trick of installing the standard component.cgi program under the name _component.cgi in a webapp's htmldir, and then symlinking _component.cgi to whatever names one wants, e.g. index.cgi.

In fact, this is more than a trick, it's the way things usually work. In our bad poetry example, the URL is supposed to be http://bedrock.com/~fred/poem.cgi. This means that we will symlink ~/public_html/_component.cgi to ~/public_html/poem.cgi. It also means that, when WebApp::Component::Main() is called, it will see that it is being invoked under the name poem.cgi, and, thus, look for a file called poem.pl in the codedir of the webapp, and, if it exists, load it and try to execute a sub named "poem" with a standard set of arguments.

The URL name, file name and sub name all must line up. It's another one of WebApp's simplifying (or simplistic, if you like) assumptions. All of the code in .pl files in the codedir is implicitly imported into the main package; if you want something else to happen, you must do it yourself, but WebApp assumes that any subs it is supposed to invoke (like poem, above), live in main, so don't try to fool it with your all fancy city talk. Yabba Dabba Doo.

Obviously, not all webapps are so simple that they can live with one little chunk of templatized HTML. Besides, the _component.html thing is just supposed to be a default, to make it easy to have a catch-all when you are doing development. Thus, our rule about the name of the sub and the URL and the perl code file all lining up also extends to the HTML template file. In our example, we could either name the template _component.html, or we could call it _poem.html. The leading underscore, remember, is there to protect the "raw" (un-"s///"'ed) _poem.html from ever being served up to the user, which is the web equivalent of dropping your drawers in front of strangers (again, if you go for that sort of thing, I apologize for being such an insensitive brute, but would it kill you to shave down there, once in a while? Would it? Your legs, I mean. Freak. YDD.). WebApp knows about that, and tries to find a file called _poem.html in the htmldir when it is executing poem.cgi. If it finds it, then this file is used instead of _component.html.

If a static, one-to-one mapping from CGI program to HTML template filename is not good enough for you, you can tell WebApp what to do in your code. The HTML generation is not begun until all the code has had a chance to run against the latest inputs, and, thus, to set things in $vars. Any key with a leading underscore in $vars is ignored for the purposes of "s///" processing, and WebApp frequently uses such names internally. Three such reserved, special names are _TEMPLATE, _TEMPLATE_FILENAME, and _TEMPLATE_HANDLE.

If the _TEMPLATE key is defined in $vars when the "s///" engine is kicked, it is used instead of the name of the CGI program, so you could programmatically force a set of CGIs to use a common HTML template for "s///" purposes. Furthermore, if the _TEMPLATE_FILENAME key is set, then it is used as the template filename, to the exclusion of anything else. This allows a webapp to specify template files in odd places, or based on more complex criteria (e.g. perhaps it authenticates to some networked filesystem on behalf of its user).

If none of those two is any good, then perhaps _TEMPLATE_HANDLE will do the trick: if set, it is expected to be an object that implements the IO::Handle interface, and is used instead of any other means to provide the boilerplate HTML for the "s///" engine. One handy trick is to use an IO::String object to hold some programmatically crafted chunk of boilerplate. This gives you ultimate control over what the "s///" engine is given to work with.

Finally, for the pathologically perverse, there is one more way to provide your chunk of boilerplate, although it is not recommended, and may be removed in future versions: if, instead of using the standard component.cgi and symlinking, you write your own CGI programs that invoke WebApp::Component::Main() themselves, you may stuff the boilerplate into the CGI program itself, after an __END__ token, as per the normal Perlitious custom. This only works if there is no HTML file for the specific component, e.g. no _poem.html. It's kind of gross, anyway. YDD.

The trade-off here is whether or not you want more than one webapp in ~/public_html, since that's where we're putting this in our current example. For this case, it makes more sense to call the HTML template file _poem.html, since this is a one-off webapp, that might coexist in the same directory with many other little webapplets... Never the less, we need a name for this particular little webapp, so let's call it "clang". Normally, there will be a symmetry between the name of the htmldir and the name of the codedir, but in this case, we're going to break that: the htmldir will be ~/public_html, but the codedir will be ~/code/clang.

So, we've got the following files:

  ~/public_html/_poem.html
  ~/public_html/_component.cgi
  ~/public_html/poem.cgi (a symlink to _component.cgi)
  ~/code/clang/WebApp/... (installation of WebApp::*)
  ~/code/clang/poem.pl (our sample code, above)

At this point, going to http://bedrock.com/~fred/poem.cgi will cause _component.cgi to be invoked, with the name poem.cgi, which it will use to find poem.pl, load it, and invoke the sub named poem with some standard arguments. This sub should normally return undef; any other value is assumed to be an error of some kind, and is pushed into a key called ERRORS in $vars (an arrayref, normally, see "PREDEFINED VARIABLES", below). It is expected, assuming the code didn't bomb out, that the main side-effect of this code is to add or change things in the $vars hashref, which our sample code does. Once the code has had a chance to run, the "s/// engine" has a whack at _poem.html, and produces the final result for the browser.

Yabba Dabba Doo.

If you use the webapp Install command, all of the steps related to getting your webapp's files put in the right places will be done automatically, including updating the installation of the WebApp code modules themselves if desired.

If you use the webapp Create command to set up your webapp's development area, a template Makefile will be copied into it and "s///"'ed as apropos for your nascent webapp. You can use this as a starting point, adding the names of new perl and HTML files, images, etc. It defines several standard targets, which perform some or all of the steps above, as needed; one of these steps is invoking webapp Install, so we eat our own dog-food on that score. Try typing webapp Create at the shell and following the prompts... be sure to leave room for extra fun! Yabba Dabba Doo.

Example (II)

Alright, this is pretty stupid stuff. We can "s///" some damn variables into a template. There are so many ways of improving matters, even before we consider suicide for having foisted more bad poetry on the world.

All of the stuff you need to make things better tends to gather around two categories: stuff directed towards the user (e.g. the person sitting behind the web browser), and stuff directed towards our context (e.g. the server on which this is all running). In the former category, there are things like cookie management, authentication, preferences, etc. In the latter category, there are things like sessions, user identities, global configuration settings, database configurations, etc.

Yabba Dabba Doo, I stopped here. Have the example store its data in a database. Then, refine it to use Sessions. Then, probably split all the examples out into another POD, because this one is getting hefty (25 printed pages already, Yabba Dabba Doo!).

THE LIFE OF A PAGE LOAD

Yabba dabba doo.

WebApp::Component::Main() gets called from some CGI.

The configuration for the webapp is loaded, if any.

A handle to the database for the webapp is opened, if needed.

Inputs are sanitized via WebApp::Utils::defang().

The _init component is invoked, if it exists. _init is responsible for authentication, if the webapp wants or needs it (not all do). The default _init pays attention to the webapp's config (i.e. [Auth]enabled, [SQL]no_database, etc.)

The component name is resolved and files located.

The actual component's code is invoked.

The _cleanup component is invoked, if it exists.

The "s/// engine" is called with the resulting $vars hashref, an open IO::Handle to the template, and some other environmental stuff.

The HTTP headers and resulting "s///" spew are sent to the browser.

The handle to the database is torn down, if needed.

TEMPLATE LANGUAGE

This is the template language. The template language loves you and wants to be your friend. Unfortunately, the template language has difficulty expressing its feelings; this is no reason, however, for it not to be useful to you. Yabba dabba doo.

All the action happens between percent signs. I could've picked some other punctuation character, I suppose, but the percent sign has a long and and storied history, and dollar signs seem so... gauche.

WebApp is a bit funny, in that it processes templates a line at a time, where lines can be delimited either by the standard Unix line-feed character, or by the DOS Carriage Return + Line Feed. This is the source of some slight awkwardness, since certain constructs must be constrained to a single line in order to work. In practice, this doesn't seem to be much of a problem, since just about anyone using WebApp at this point in the game is using a text editor to build their webapp, and not some fancy GUI. This will probably change over time, though, and will need to be addressed.

In any event, more than one "s/// construct" on a single line is generally allowed, except for conditionals.

The following menagerie of "s///" constructs is available:

Variable Interpolation

As we saw in the examples so far, the notation

  %(foo)%

causes the value of a variable named foo to be interpolated into the output at that point. This is the simplest form of "s///" processing. Nesting is not currently allowed, e.g. doing

  %(foo_%(bar)%)%

is just going to get you into trouble, and not do anything really useful, unless there happens to be a key in $vars named foo_%(bar)%, and I certainly hope there's not. I could be talked into allowing recursive variable "s///" facilities, but I have not yet needed them myself, and it seems like quite a lot of expense at "s///" time, just for generality's sake. One's mind should be open, but not so open that one drools constantly.

One point worth noting, however, is that due to the multi-pass nature of the "s/// engine", if the text that %(foo)% expands into itself contains "s///" tokens, well, then that's fine and dandy, and will do just what you expect. For instance, if a piece of perl code in a webapp does something morally equivalent to

  $vars->{foo} = q{Name: %(losers_name)%};

Then, %(foo)% will expand into something that contains a new s/// construct, namely, %(losers_name)%, and the s/// engine will notice this, and do the right thing (assuming that losers_name is a key in the $vars hashref). This makes it possible to hook up little bits of code that stitch complex content together; the code can, of course, be recursive, it's just that the "s///" tokens themselves cannot. This is in line with the idea that the constructs used by the "s/// engine" should be simple to explain to non-programmers, and also with the idea that all the heavy lifting should be done by the programmers, not the web designers.

Mappers

If the value of foo is a scalar, it's fairly obvious what is intended by simple interpolation: stuff the scalar's contents into the output at that point. If, on the other hand, its value is some kind of reference, one might reasonably expect, or require different behavior. The construct is called a "mapper" in WebApp, and it looks like this:

  %{foo,<tr><td><p>%(x)%</td><td><p>%(y)%</td><td><p>%(z)%</td></tr>\n}%

This particular example expects that the value of foo will either be a hashref, with keys x, y and z, or an arrayref consisting of such hashrefs. In the former case, supposing the value of foo was the hashref

  {x=>1, y=>2, z=>3}

then we would get

  <tr><td><p>1</td><td><p>2</td><td><p>3</td></tr>

In the latter (arrayref of hashrefs) case, we would get multiple lines of output, one for each element in the hashref after going through such a "s///". Furthermore, the special pseudo-variables %(#)% and %(@)% can be used in the template to stand for the current offset into the arrayref and the total number of elements in it, for each iteration. Note, also, the trailing \n in the mapper's format specification: if you don't put it there, all of the mapper's output will get glommed onto a single line. Oh, the horror. If you don't like specifying this newline by hand, add this to your webapp's configuration file:

  [HTML]
  mapper_newline=1

and it will be added automatically.

Finally, if the value of foo is none of the above, but it is a blessed reference that can answer the method 'as_html', then the results of invoking that method with the template text as its single argument will be the result of the mapper "s///". This rule also applies when processing an each element of arrayref-valued variable, but we only go that far, and no further, in unwinding the value of a variable. If you've got a variable whose value is an arrayref of arrayrefs, and they aren't blessed into a package that defines as_html, well, then, it's going to get ugly. To recurse may be divine, but it sure is expensive, and generally isn't needed.

Conditionals

It is quite often the case that one needs a tiny bit of flow control on a web page. WebApp supports non-nested, simple conditional expressions, which must fit on a single line. You can test a condition for truth or falsity, but you cannot nest.

  %?foo?%
  <p>text if variable foo is "true"...
  %.foo?%

This will spew

  <p>text if variable foo is "true"...

only if the variable foo is defined and has a non-false value, by the normal perl standards.

  %:foo?%
  <p>text if variable foo is "false"...
  %.foo?%

This is the opposite test. You must bracket them this way, you cannot leave an unterminated conditional in either sense without first terminating it:

   %?foo?%
   <p>blah
   %:foo?%
   <p>WRONG
   %.foo?%

Is incorrect. An extra %.foo?% is needed before the %:foo?%.

Note that all conditional expressions must be on a single line, and must start at the beginning of the line (except see "Magic Comments", below, for one useful exception). This is because WebApp uses a simple test to decide whether to go from the (relatively cheap) conditional-matching part of the "s///" procedure to the (more expensive) interpolation, mapper, and file inclusion parts by skipping all lines in a failed conditional. Although it restricts the way that conditional expressions can look in a file of HTML, it does make it simpler to spot them, to understand what they do, and to implement them.

Since we cannot nest, we must have an out for more complex conditions. Both the test-positive and test-negative senses of the conditional expression have an alternate form:

  %?foo &some_perl_code?%
  <p>spew if some_perl_code returns "true"
  %.foo?%

The name of the variable is not really relevant in this form, it's more of an identifier to associate with the conditional, to make it simpler to find. Within this perl code, the name $vars is bound to the hashref of variables that are visible at this point, so to test some complex boolean expression of variables:

  %?foo &$vars->{a} && !$vars->{b}?%
  <p>spew if a and not b
  %.foo?%

This is an example of something in HTML boilerplate that a web designer might need a bit of help with. There are some ways of making this better, too, though. A webapp of any size will accumulate a library of common routines, and these routines should be in scope in this context. The designers and the programmers could agree, therefore, on a convention, so that some programmer arranges to have

  sub common_test {
    my $vars = shift(@_);
    return $vars->{a} && !$vars->{b};
  }

defined somewhere, and then the web designers can just write

  %?foo &common_test($vars)?%
  <p>spew if common_test($vars) returned true
  %.foo?%

which is perhaps a little easier to explain to the poor dears. In fact, this is such a common thing, that there is a bit of syntactical sugar available to them. The following is precisely equivalent to the above:

  %?foo &common_test?%
  <p>spew if common_test($vars) returned true
  %.foo?%

This is because it's the standard thing to pass $vars as the first argument to such code, so if you don't explicitly provide an argument list (or an empty set of parentheses), WebApp assumes that's what you meant. It makes it easier to explain, since you don't have to cover what that dollar-sign vars thingie means to the designers, you just have to agree on a set of names for these common tests. Honestly, the designers have enough on their minds, anyway, between whipping up eye-candy, drinking high-powered "energy" drinks, and organizing their mp3 collections, so it's best not to tax them with too many complex boolean expressions.

Furthermore, since the name of the "variable" being tested in this form of conditional doesn't really matter, you can improve readability with it in any number of ways, e.g.:

  %?if &we_hate_the_user?%
  <p>we hate you
  %.if?%

almost looks reasonable (c.f. "Magic Comments", below, for one more refinement that improves readability even more).

Finally, I admit that this line-at-a-time conditional thing, in addition to being efficient, is also a secret plot on my part to keep webapp developers from spewing their entire HTML document out on one long line. I hate that. I like my HTML to be structured and readable by humans, if only for debugging purposes. Yes, I know, I'm a freak, and a fascist, and something else that starts with "F" to boot. Yabba Dabba Doo.

File Inclusions

File inclusion is like variable interpolation; it can appear anywhere, and multiple times on a single line. It looks like:

  %<file>%

The path to the file is interpreted relative to the htmldir of the webapp. Its contents are stuffed into the output in place of the file inclusion construct. If the incf_comments configuration option is set in the webapp's config file, e.g.

  [HTML]
  incf_comments=1

then the contents of the file will be enclosed in HTML comments that look like

  <!-- file -->
  ...contents of file
  <!-- /file -->

so you can spot them more easily in the HTML output.

Unlike other s/// contexts, simple variable interpolation is available in the filename, e.g.

  %<form_%(lang)%.html>%

Will include form_en.html if the lang variable is set to en. This feature is on by default, but can be turned off with the sub_incf configuration option:

  [HTML]
  sub_incf=0

Other forms of "s///" processing are not available inside the filename, e.g. mappers, etc. C'mon, that would just be insane, and is it really necessary?

Variable Setters

If you're going to start separating out useful subcomponents of a webapp into smaller files and using the file inclusion operator, it would also be useful to be able to change the value of variables on the fly, so that such componentized HTML templates could be parameterized, too.

You can set the value of the variable named var to val like this:

  %[var:val]%

The most common way of using this is to bring in a file that defines something like an HTML FORM, where the names of the buttons and labels all have some prefix and/or suffix appended to them, to make them unique in the webapp:

  %[form_prefix=wacka]%
  %<_form.html>%
  %[form_prefix=wonka]%
  %<_form.html>%

You would then write _form.html like this:

  <form name="%(form_prefix)%_form" method="GET" action="%(ACTION_URL)%">
   <input type="submit" name="%(form_prefix)%_submit" value="hit me!">
   <!-- ... etc ... -->
  </form>

(We'll cover ACTION_URL, and other pre-defined "s///" tokens in "PREDEFINED VARIABLES", below).

The value is interpreted as a scalar, and has leading and trailing whitespace stripped. If the 'eval_varsets' HTML option is turned on for the webapp, and the value's first non-whitespace character is an ampersand, then the rest of the value is evaluated as perl code, and the result used for the value.

  %[some_complex_thing:&new SomeObject($%vars)]%

sets the variable some_complex_thing to the result of the new call. Perhaps more useful:

  %[buttons:&{left=>"escierda",right=>"derecho"}]%
  %<internationalized_form.html>%

Magic Comments

In order to make browsing un-processed boilerplate HTML a bit easier, there is a feature, called magic_comments, which allows you to embed all of the above constructs in HTML comments, and have the comment begin and end sequences stripped off in the output.

In order for this to work, the construct must appear by itself on a line, surrounded in an HTML comment, like so:

  <!-- %?foo?% -->
  <!-- %[something:foo_is_true]% -->
  <!-- %<other.html>% -->
  <!-- %.foo?% -->

With magic_comments turned on, this will behave exactly as though all of those lines were not inside of HTML comments. The default is for magic_comments to be turned on; to turn it off, do something like this in your webapp's config file:

  [HTML]
  magic_comments=0

One way of approaching separating the development of HTML and code is to mock up parts of e.g. web pages that contain tables full of information from dynamically executed queries, by creating filler content with the right shape, and then enclosing it in a conditional that will never be true:

  <!-- this table holds the goods -->
  <table>
   <tr><th>date</th><th>descr</th></tr>
  <!-- %?DESIGN?% -->
   <tr><td>2003-11-19</td><td>dummy entry 1</td></tr>
   <tr><td>2003-11-19</td><td>dummy entry 2</td></tr>
   <tr><td>2003-11-20</td><td>dummy entry 3</td></tr>
  <!-- %.DESIGN?% -->
  <!-- %:DESIGN?% -->
  <!-- %{SQL_RESULTS,<tr><td>%(date)%</td><td>%(descr)%</td></tr>}% -->
  <!-- %.DESIGN?% -->
  </table>

As long as nothing sets the DESIGN variable to a true value, the result will look "normal" in a web browser, at least for the purposes of a mockup.

Trips and Reline Trips

The guts of the "s/// engine" makes multiple passes over each line of input, and each line might potentially expand into many more lines, e.g. with file inclusion constructs. The resulting lines are pushed into the list of lines waiting to be processed.

The outer loop, therefore, is line-based, and the inner loop is "s///" based. When subsequent iterations through the "s///" loop do not contain additional "s/// constructs", it ends; there is, however, a configurable limit to how many iterations we'll go through this inner loop, regardless of what happens: max_trips. The default value is 100, which is generally much more than enough. Likewise, there is a maximum number of times that the outer (re-line) loop is executed: max_reline_trips. The default value is 1000, which is, again, usually more than enough.

Tweaking these numbers in your webapp's config file can perhaps improve performance, slightly, but make sure you know what you are doing, since they can have subtle, and potentially very bad effects on the resulting HTML. If anything, you might have to increase them if your output still contains unresolved "s/// constructs".

  [HTML]
  max_trips=100
  max_reline_trips=1000

SESSIONS, COOKIES, USER IDENTITIES, AND AUTHENTICATION

Talk about WebApp::Session, _COOKIE_NAME, WebApp::User and how the standard _init.pl does authentication. Maybe also talk about email-based automated user registration. Yabba Dabba Doo.

INPUT SANITIZATION

Input sanitization is so important it gets its own section.

Yabba Dabba Doo.

Pidgin

Yabba Dabba Doo.

Pidgin is an extremely simplified form of markup suitable for use anywhere that users of your webapp must be allowed to enter some kind of formatted text. The WebApp::Utils::pidgin() routine turns this simple markup into HTML, and the WebApp::Utils::depidgin() call strips pidgin formatting from a string. Furthermore, the input and output sanitization code knows about pidgin, and can be asked to invoke pidgin() on the fly, which can be useful.

Pidgin looks like this:

  [[style]bold]some text[[!style]]
  [[link]href=http://some.where,external]text for link[[!link]]

which turn, respectively, into:

  <b>some text</b>
  <a href="http://some.where" target="_blank">text for link</a>

Anywhere that you feel the user needs to enter text that contains some formatting information, pidgin is the preferred way of doing this in WebApp. In fact, by default, the input sanitization that will happen before your code even gets a chance to run squishes all HTML-style markup out of any input fields, so you have to actually work pretty hard to write a webapp using WebApp that accepts raw HTML. This is by design. Raw HTML is bad. It is bad design to write a webapp that accepts raw HTML, and that's just the way it is. If you really want to do something like that, you can jack up your webapp's configuration to do it, but you'll have to read the code in WebApp::Component to figure out how, because I'm not going to tell you. True desperados don't need directions to the heist. Yabba Dabba Doo.

The tags that pidgin uses are called pags, short for "pidgin tags."

Yabba Dabba Doo.

OUTPUT SANITIZATION

Although your webapp should never send anything to the user that wasn't the result of computations based on sanitized inputs, it might be the case that you want to sanitize output sent to the client as well, if you are extra paranoid about "XSS" problems, or if some of the data that your webapp manipulates comes from potentially unsanitized sources. Also, it is easily possible that some random computation that took completely clean inputs did end up producing some nasty HTML, since it's all just little bits of code and fluff.

WebApp's sanitization of output is configurable via several options in the HTML configuration section. The sanitization is done by the defang_string primitive on the results of each stage of the s/// engine, but never on the entire spew, since that is pguaranteed to have all kinds of HTML in it that will make defang_string awfully unhappy. That's sort of the whole point, come to think of it.

You can control output sanitization per "s///" construct, if you want. Both simple variable interpolations and mappers support an alternate syntax that indicates no output sanitization should be done on the result for "s///" purposes:

  %(~foo)%      <!-- no defang_string on value of foo -->
  %{~foo,...}%  <!-- ditto, for a mapper -->

Use this with care, typically somewhere that it is just more convenient to stuff a "dirty" value into $vars than it is to templatize things fully... not very often, in other words.

BINARY OUTPUT

Sometimes, you have a component that wants to spew out binary data, e.g. an image, mp3, etc. There is only one right way to do this in WebApp: set a key in $vars called _BINARY_CONTENT to a true value, and put the actual bytes you want spewed into the CONTENT key. If you want specific headers sent along as well, e.g. Content-Type, then use the WebApp::Utils::set_header call to set the header in $hdrs, which is normally passed to all component subs.

If the "s/// engine" sees _BINARY_CONTENT set in $vars, it bugs out and stops trying to do anything to the spew. You can use this behavior to send arbitrary spews of any kind to the browser, as well.

Yabba Dabba Doo.

CONFIGURATION FILES AND VARIABLES

The WebApp::Config class implements a simple, extensible configuration API based on the Config::IniFiles CPAN module. Each webapp can potentially have its own ini-style configuration file. We have seen snippets of such a file throughout this document, e.g.

  [HTML]
  max_trips=100
  max_reline_trips=1000

The [HTML] says that we are defining variables in a configuration section named HTML. Variables can have simple, scalar values. There are some config sections that WebApp itself will pay attention to, as we have seen, but a webapp's config file can have any number of sections, which the webapp code itself uses for whatever purpose is required. The only restriction is that any config variables mentioned in this document are reserved for WebApp's use, and cannot be used for any purpose other than that described here.

All webapps on a server theoretically keep their config files in the same directory, and the files are always named webapp.config. The config directory can be set per-webapp, as well, via the webapp Install command. On BSD Unix systems, the normal place for webapp config files is /usr/local/etc/WebApp.

Standard configuration sections and the variables that love them:

App

The [App] config section holds general configuration fu.

name (no default)

The name of the webapp; should not have spaces.

url (no default)

The canonical URL for this webapp. Available to template HTML as the %(CANON_URL)% variable.

use_cookies (default: 1)

If non-zero, this webapp uses cookies to associate a session ID with a specific browser session, using the standard Set-Cookie HTTP header. If you don't want cookies, turn this bad boy off.

cookie_name (default: webapp_nameCookie)

The name of the cookie to set.

cookie_lifetime (default: 0)

The lifetime of the cookie. The default (zero) is for the cookie to persist forever. We look cookies. They taste dandy.

Registration

The [Registration] config section governs the built-in, autonomous, email-based user registration code. This is a mechanism by which your webapp can automatically allow new users to sign up. Not all webapps want this sort of thing, e.g. if you want access to your webapp to be by invitation only.

The bulk of the code to support this functionality is in WebApp::User; see that WebApp::User for more information.

open (default: 0)

If non-zero, the user registration system is turned on, otherwise it is not.

admin_email (no default)

The email address of the administrator responsible for this webapp, or whatever other official email address there is for the webapp.

smtp_server (no default)

The SMTP server (and, possibly, port number, separated by a colon) used to send registration-related email.

Debug

The [Debug] section has various variables related to debugging and performance tuning.

verbosity (default: 0)

Verbosity level. The higher the number, the more verbose the webapp's logs will be; by default, a webapp logs to stderr, as per standard Apache convention. There is no maximum value, but WebApp itself does not use verbosity levels over 9. Setting this to a value over 3 generally produces a large amount of debugging output, and can slow things down substantially. For webapps that are in production, this should be set to zero.

Also, see the documentation on WebApp::Utils::web_log in WebApp::Utils for information on the WebApp logging call.

cache (default: 0)

If non-zero, turn on debugging output for the WebApp file cache. Normally, this is not needed, but if you are suspicious that something is being cached incorrectly, this is the thing to tweak.

includes (default: 0)

If non-zero, turn on debugging output related to the %<file%> s/// construct.

timings (default: 0)

If non-zero, print various timing statistics on each page load in the Apache error log.

SQL

The [SQL] section governs the way that the webapp will deal with DBI databases.

no_database (default: 0)

If non-zero, this webapp does not use a DBI database of any kind, and no WebApp code should attempt to do anything DBI-related. This automatically turns off certain other features, such as cookies and authentication.

dsn (default: dbi:Pg:dbname=webapp_name)

The DSN for the DBI database used by this webapp.

user (no default)

The user to give to DBI to connect to this webapp's DSN.

password (no default)

The password to give to DBI to connect to this webapp's DSN.

always_count (default: 0)

Used by the WebApp::Utils::selector() primitive to control whether or not it attempts to get a total count of rows in an SQL SELECT statement's result set each time it is issued. See the documentation in WebApp::Utils on the selector primitive for more details.

try_counter (default: 0)

Yabba Dabba Doo.

limit_select (default: 0)

Yabba Dabba Doo.

HTML

magic_comments (default: 1)

If set, the "Magic Comments" feature is turned on. Turn this off it you get odd looking "s///" output.

max_trips (default: 100)

Maximum number of trips through the inner loop of the "s/// engine" per "line."

max_reline_trips (default: 1000)

Maximum number of trips through the re-line-ifying (outer) loop in the "s/// engine" per "line" of input.

legacy_parsub (default: 1)

If turned on, %foo% will be recognized as a valid "s///" variable interpolation inside of mappers; this was the old behavior, and I should probably just make it go away, but I am a lazy pig.

eval_varsets (default: 0)

If turned on, the perl-code-in-a-varset feature is enabled, allowing you to say

  %[var:&some_perl_code]%

to set the variable named var to the result of evaluating some_perl_code in HTML templates. This can easily be a security issue, so it is off by default, and you really don't need it unless you are incredibly lazy, or addicted to dangerous drugs.

no_varerrs (default: 0)

This setting is only relevant if the eval_varsets option is also true. In that case, if you have a varset that contains perl code, and the perl code throws an error, the error message is used as the value for the variable. Useful mainly for debugging, but just jacking [Debug]verbosity up to 3 or greater will give you more information in Apache's error log, anyway.

sub_incf (default: 1)

If true, file inclusion "s/// constructs" will have variables in them interpolated before attempting to open the file. This is on by default, but if you are really paranoid, turn it off.

mapper_newline (default: 1)

If on, mapper output always has a newline appended to it after each iteration (if the variable being "s///"'ed is an arrayref). On by default, because purty HTML is happy HTML.

defang_out (default: 1)

Invoke defang_string on the result of all "s///" operations. The options to defang_string are controled by the four options: pidgin_out, angles_out, nonprinting_out and entites_out.

pidgin_out (default: 0)

Yabba Dabba Doo.

angles_out (default: 0)

Yabba Dabba Doo.

nonprinting_out (default: 0)

Yabba Dabba Doo.

entities_out (default: 0)

Yabba Dabba Doo.

nonprinting_in (default: 0)

Yabba Dabba Doo.

entities_in (default: 0)

Yabba Dabba Doo.

breaks_in (default: 0)

Yabba Dabba Doo.

anglesin (default: 0)

Yabba Dabba Doo.

Auth

enabled (default: 1)

Yabba Dabba Doo.

user_var (default: User)

Name of the form input used for the user name, during authentication. An all-uppercase form of this is tried in the $vars hash, if there is no user input with that name.

password_var (default: Password)

Just like user_var, but for the password.

CASCADING STYLE SHEETS

Describe our Perly CSS fu. YDD.

PRIMITIVES FOR PERL PROGRAMMERS

Describe the raw materials in WebApp::Utils, especially selector and friends.

Describe our database indirection fu, e.g. parse_db_ts, read_blob, etc.

Yabba Dabba Doo.

PREDEFINED VARIABLES

There are some keys in $vars that are defined by WebApp before your code gets a chance to run. You can override them, of course, except for the ones that start with an underscore, which are read-only. I have no way of enforcing this, but WebApp will probably stomp its little feet and cuss you out if you do anything to the underscored ones.

_CONFIG

The WebApp::Config object for this webapp.

_SESSION

The WebApp::Session object for this session.

ACTION_URL

The base URL for HTML form action attributes to use for this component.

APP_NAME

The name of this webapp, as you told it to the webapp Install command.

APP_TITLE

The title of this webapp, as you told it to the webapp Install command. Might be blank.

PAGE_TITLE

The title for this component's HTML output, e.g. what goes in the TITLE element in the HEAD of the HTML output. WebApp puts something together by default based on APP_TITLE and APP_NAME if you don't do something yourself.

TITLE_DESCR

More verbose title, which gets defaulted, again.

VERSION

The version of this webapp, either from your configuration, or from your code's setting of $main::VERSION.

HOME_URL

The URL for the this webapp's project home-page.

MESSAGES

An arrayref that contains collateral messages and other non-essential output produced by the webapp's code as it ran. Often displayed in a side-bar, or what have you, but that's up to the designers.

ERRORS

Like MESSAGES, but for errors.

COMPONENT_NAME

The name of this component.

WEBAPP_URL

The URL of the WebApp project home page:

  http://www.stalphonsos.com/~attila/WebApp

WEBAPP_VERSION

The version of WebApp in use by this webapp.

PERFORMANCE TIMING AND DEBUGGING

Talk about timing keys. Yabba Dabba Doo.

STRUCTURE OF THE FRAMEWORK

This is all ancient and crufty, and must be rewritten. Yabba Dabba Doo.

There are two major pieces of WebApp: component.cgi and the supporting WebApp::* modules. Component.cgi is the code that actually drives the web app. It takes care of substituting symbolic values in template HTML files, finding files full of perl code to generate values for those variables, finding the template HTML files (or producing boilerplate HTML if it cannot), and other general infrastructure. To do this, it uses two other kinds of code:

code in WebApp::* modules

code in user-supplied .pl files

The following modules come with the system, and are intended for use inside of web application code

WebApp::Utils

General-purpose utility routines that are used everywhere; this really consists of the core of the code in WebApp.

WebApp::User

Persistent user identities for web applications that can save passwords and other attributes (such as persistent preferences, session timeouts, and so on)

WebApp::Session

Persistent sessions, which can have variables in them, are referred to via a single key (stored in a cookie in the users browser), and so on.

WebApp::Component

The "s/// engine", CGI driver, and supporting code.

WebApp::User and WebApp::Session use a DBI database (by default PostgreSQL) to store their data.

In addition, there is a module called WebApp that can be used from the command line on the machine where web applications live to manage them. This includes setting up skeletal web app code directories, initialize the RDBMS tables needed by WebApp::User and WebApp::Session, adding and deleting users for a given web application and so on. The 'webapp' utility program can be used as a front end to this module from the command line; the shell command

  $ webapp help

will give more information

The other source of code is user-supplied .pl files, which follow a certain convention. Suppose that a given web app has a few major components:

  • an index, or default page
  • a page where a directory of items may be viewed
  • a page where an individual item can be edited
  • a page where user preferences can be set

Let us call these components index, directory, edit and prefs, and let us further call this app 'foo' for ease of reference. If a web app were installed under /var/www/cgi-bin/foo, then component.cgi would appear there (after suitable variable substitutions) under the name _component.cgi. The leading underscore will prevent it from being viewed or invoked incorrectly. All normal user components of foo would then appear in /var/www/cgi-bin/foo/ as symbolic links, e.g. index.cgi, directory.cgi, edit.cgi and prefs.cgi would all be symbolic links to _component.cgi (in fact, the webapp Install utility creates the index.cgi symlink by default). The code in _component.cgi figured out what the base name of the actual component being accessed is (e.g. "directory" for http://server/cgi-bin/foo/directory.cgi), and looks for a file called directory.pl in the application code directory; lets say that we have set the code directory to /var/www/code/foo (it should be outside of the normal DocumentRoot, so that casual browsing cannot reveal it). The code in _component.cgi that executes a component would first do something equivalent to:

  do "/var/www/code/foo/directory.pl";

to pull in the code for the component, and then call

  directory(...args...);

assuming that directory.pl defines a routine called directory. These component subs always take the same arguments, by position:

  $params    a hashref of sanitized CGI parameters
  $vars      a hashref of variables to be "s///"'ed in the HTML
  $dbh       a DBI handle
  $query     the raw CGI query object from the CGI module
  $hdrs      a hashref of HTTP headers to send the browser

The component routine (in this case directory()) is free to modify any or all of these except for $dbh, which can be used to do whatever component-specific database fu that needs to be done. The directory.pl file should also define any other subs needed by the main sub, all of which are loaded into the main package by default. It can assume that certain globals have been set, and that the WebApp::* modules are available for use. Component routines are supposed to communicate their results by setting or modifying values in $vars and possibly $hdrs; the former will be substituted into the HTML after all code has run, and the latter will be sent to the browser as HTTP headers in the response.

The template HTML for e.g. the directory component in our foo application would normally be found in /var/www/cgi-bin/foo/_directory.html; as with _component.cgi, the leading underscore will keep the file from being visible outside of the web application.

In practice this is all quite simple. To create our foo webapp, we might've done something like:

  # createdb foo
  # webapp Install app=foo dir=/var/www cgi_dir=/var/www/cgi-bin/foo

To add our components to it, we would've done

  # cd /var/www/cgi-bin/foo
  # ln -s _component.cgi directory.cgi
  # cp <directory_template_html> _directory.html
  # ... and so on for all components
  # cd /var/www/code/foo
  # cp <directory_perl_code> directory.pl
  # ... and so on for all components

to add a new user:

  # webapp NewUser

CONTACTING THE AUTHOR

I am known as "attila" on the {ARPA,Inter}net, and have been for years. I use this identity for my own personal, non-work-related hacking, and other personal creativity, e.g. writing, and so on.

My web pages are at http://www.stalphonsos.com/~attila

StAlphonsos.com (St.Alphonsos, pronounced "Saint Alphonsos", a slight variation from Zappa's spelling), is a hacker's collective, comprised of myself and a few trusty friends. We have a your standard decaying urban area self-hosting situation: a closet in an old office building with a bunch of machines in it and a T1 thrown over the wall from a similarly-inclined friend.

My email address is mailto:attila@stalphonsos.com. My GnuPG key ID is 4FFCBB9C, and that key's fingerprint is A9A1 FD2A 2EFC 70B6 1036 F966 AFCF 222D 4FFC BB9C. It is on all the standard keyservers, or should be. You can always get it at

  http://www.stalphonsos.com/~attila/gpgkey.txt

If you send me email, and I've never heard of you before, and your email is not at least PGP-signed (ASCII ARMOR UBER ALLES), then there's a strong chance it will wind up in my spamtrap. To avoid this, stick "[webapp]" near the front of your email's Subject header. It's a simple trick, but it seems to be working, and not causing me any headaches thus-far. I try to respond to email as quickly as I can, but you know how it goes.

TODO/BUGS

This thing has to be fleshed out, I just started with a copy of the README.

We here at the agency have no sense of humor that we are aware of, Ma'am.

YDD

You might think me cheesy to argue, on the one hand, for "Yabba Dabba Doo" and contra "XXX," only to turn around and throw "YDD" at you, but you surely must see that they are two completely different animals.

"XXX" is a sign without direct meaning, which only gains meaning from its context, and which adds nothing to it, except to mark a particular place in a text. It is a digital inkblot, and means nothing more or less than a real one. "YDD," on the other hand, is merely a contraction of a sequence of words that are full of life and vigor. Why, I'll bet you even smile a little when you see "YDD" as you think "Yabba Dabba Doo" to yourself, don't you? I know I do. Clearly, then, "YDD" has some advantage over "XXX," if only in the production of this hypothetical smile. One could continue the thought, and conclude that if "YDD" is a smile, then "XXX" must be a frown of some sort, and I would not discourage such an effect. Inkblots are generally messes made by accident, cursed by their authors, and rarely convey anything positive; they scowl at the text that surrounds them, as if to say "Feh." An inkblot that did not scowl, which managed to smile, I'd call that more of a painting au natural. A primitive sketch.

My XXX-infested existence certainly could use a bit more YDD. So, let's end this document on a positive note:

Yabba.

Dabba.

Doo.

GLOSSARY

codedir

The root of the directory tree where a webapp's code files live. Should be outside of Apache's DocumentRoot, but accessible by whatever user id httpd runs under.

htmldir

The root of the content directory tree for a webapp. Must be under DocumentRoot somewhere. Under normal WebApp circumstances, files in this directory tree whose names start with an underscore are not directly visible to clients, and are intended to hold content for the "s/// engine".

mapper

A kind of s/// token in WebApp's HTML template language that will iterate (map) a chunk of templatized HTML over some set of data items, concatenating the resulting set of s/// results. Used to e.g. present rows from an SQL SELECT query in an HTML document. See also: s///

pag

A Pidgin Tag.

pags

The set of Pidgin Tags supported by WebApp.

s///

Perlian for "substitution", or any variant thereof as per context, e.g.: in "The code will automatically s/// variables for you," the "s///" means "substitute". I pronounce s/// "suss," which, coincidentally, is also the sound of one hand clapping.

Oh, darn, I've given the answer away. Now the Zen police will get me for sure.

s/// construct

One of the "s///" constructs.

s/// constructs

One of the percent-sign enclosed constructs that the "s///" engine recognizes and deals with. The current set is:

  • Variable Interpolation
  • Mappers
  • Conditionals
  • File Inclusions
  • Variable Setters
  • Magic Comments

s/// engine

The piece of code in the WebApp framework that is responsible for s///'ing templatized HTML into something you can serve to a web browser. It does this by using the values that the webapp's code stored in the $vars hashref.

spew

The stuff that WebApp eventually sends back to the browser.

WebApp

The proper name of the web application framework described in this document. The art(ist) formerly known as "WebApp".

webapp

Contraction of "web application," used to describe, generically, any random such program, e.g.: This webapp provides a combination of webmail and web search functions.

XSS

Cross-site scripting, a class of security problem common in webapps. Cross-site scripting issues generally result from a webapp not sanitizing some input properly, and allowing inputs that contain arbitrary HTML code in them to be sent back, uncleansed, to some browser.

Suppose you have a web form with a text input field. If some nasty little user were to type the following text into it:

  <javascript>alert("you lose")</javascript>

and the text in that field were to be sent back to some browser (not necessarily the one from which the bogus input originally came, obviously), and that (target) browser were stupidly configured, the user who received the resulting output would get a little treat on their screen: a javascript alert that said "you lose" (hint: think "guestbook").

Obviously, instead of a simple alert, the code could've done anything, including all sorts of malicious stuff. Browser security with regards to these issues has been notoriously bad, especially for a certain product of a certain company whose name rhymes with "my pro's snot."

The proper way to deal with these issues is to sanitize all user inputs, so that they cannot contain arbitrary HTML code. This is why the $params hashref handed to all webapp component code has had all of the raw inputs from the browser run through the WebApp::Utils::defang() routine, which attempts to take care of this nonsense. It does this by applying strict rules to what is allowed, which may bonk your form inputs in ways you don't expect, but which are guaranteed to never allow this kind of nonsense through the door. Your webapp's code will always be able to get at the raw, unhygienic inputs from the original CGI query object via the $Q parameter, but you should be very careful in using them.

However, all of this cleverness notwithstanding, there is only really one correct way to do input sanitization for a given application: every input should have its range of legal values explicitly specified, and only inputs which are members of the set of legal values should be allowed. Clearly, this isn't always possibly in a precise, mathematical way, but there it is. You can't have rabbit stew without a rabbit, can you?

In any event, there's no way that WebApp, or any such framework can automagically zen out the mystical meaning of your form inputs and determine what is, and is not valid, so even though they are defang'ed by default, you really should always check your inputs. Seriously. Just because you don't have XSS bugs doesn't mean you don't have other kinds of bugs, and the web is a nasty, nasty little place.

Yabba Dabba Doo.

XXX

Token often used by hackers to indicate places in code that should be looked at, fixed, changed, deleted, or otherwise inspected for something fishy. In meatspace, this token also often indicates the presence of pornography, a coincidence I find both amusing, and unlikely.

AUTHOR

Sean Levy <mailto:snl@cluefactory.com>

COPYRIGHT AND LICENSE

(C) 2002-2006 by Sean Levy <mailto:snl@cluefactory.com>. all rights reserved.

This code is released under a BSD license. Please see the LICENSE file that came with the source distribution or visit http://cluefactory.com/oss/WebApp/license.html