Docs » WebApp::Documentation::Rationale

The Rationale Behind WebApp

NAME

WebApp::Documentation::Rationale - The Rationale Behind WebApp

VERSION

  Time-stamp: <2006-07-10 10:51:00 mailto:attila@stalphonsos.com>
  $Id: Rationale.pm,v 1.7 2006/07/13 19:13:40 attila Exp $

SYNOPSIS

  $ perldoc WebApp::Documentation::Rationale

DESCRIPTION

The rationale behind WebApp, in the form of several brief arguments for our point of view.

OVERVIEW AND GUIDING PRINCIPLES

WebApp is a simple, lightweight Perl web application framework based on a few simple ideas, which are outlined in the other sections of this document.

The framework started out as a personal project, after I became frustrated with the complexity and fragility of (an admitedly old version of) HTML::Mason. Since then, I have used WebApp to build several real-world systems. It covers all of what I need from such a framework, is easy to modify and understand, and may even be useful to someone who is not me.

WebApp is also easy to install and administer, perhaps a little unlike other frameworks. You need not make sure that Apache is compiled a certain way, or a specific version of Perl be installed, or invoke dark powers and sacrifice small, innocent animals to unnamed, malevolent forces.

I mean, unless that sort of thing turns you on, in which case, hey, I'm broad-minded. Just keep it off of my lawn.

I wrote WebApp because I had come to several conclusions in the course of using existing systems for building complex web applications, including HTML::Mason, PHP, Embperl and so on. These conclusions became my guiding principles. I think they are at least interesting enough to warrant a read by anyone who writes web applications. Even if you disagree, it's good to read things that you disagree with once in a while - it angries up the blood.

SEPARATE PRESENTATION AND IMPLEMENTATION

It is a bad idea to mix presentation and implementation. Separating the design and logic behind the code from the way it presents results is simply good design. It encourages modularity, it forces decisions to be made based on criteria appropriate to the task at hand, and it makes it easier for different kinds of people to focus on their various tasks. For instance, a web designer and a hacker do not do the same job (even though the same person might do both).

HTML and code are two great tastes that do not go together. Unfortunately, most web application frameworks and environments encourage the mixing of presentation (HTML) and code (Perl in our case, but whatever) to the point of utter confusion. It is common to see strings of HTML being pieced together by bits of PHP, Perl or C code on the fly. This miserable practice produces a morass that is hard to read, debug, modify, and audit.

WebApp strongly encourages the separation of code and HTML, based on a model I call "load and show". Application code is invoked by the WebApp framework in a context where many things have already been taken care of; its job is simply to load data structures into a scratch-pad by giving them names. These data structures can be arbitrarily complex, but they should not be strings of HTML (although, of course, I can't stop you from doing so if you want to).

The template language that WebApp provides has one and only one purpose: to specify the way in which these data structures should be turned into HTML. Except for the extremely limited conditional syntax, no control-flow or other decision-making is done in these HTML templates. All of the thinking is done in the code. All of the presentation is done in the HTML. The template engine (which I call the "suss engine" in other documents) turns data structures into HTML by means of the declarative specifications presented in the templates. This part of the system is completely declarative: there is no programming language embedded in the HTML.

I generally boil this down into the two-part mantra:

  • No code in HTML
  • No HTML in code

EXAMPLE

A WebApp-based program that presents data from an SQL query in a table might look like this (sans error checking and other bits not essential to the example):

  ## results.pl

  sub results {
    my($xaction) = @_; # WebApp::Xaction object

    ## We expect an input called "name" and search a table
    ## for rows whose name column matches it

    $xaction->set_vars(
      ROWS =>
        $xaction->dbh->selectall_arrayref(
          q{select id,name,description from some_table where name=?},
          { Slice => {} },
          $xaction->get_param('name'),
        ),
      NAME => $xaction->get_param('name'),
    );
    return $xaction->finish();
  }

Ignoring the details of precisely how this code is invoked, it should be clear enough that a lot of heavy lifting has been done for you. The application code itself only needs to issue the appropriate query to the database, and then make the results available to the suss engine.

The second half of the example is the HTML side of the house:

  <!-- This might live in a file called _results.html in the docroot --> 
  <html>
   <head><title>search results for: %(NAME)%</title</head>
   <body>
    <table>
     <tr>
      <th>id</th>
      <th>name</th>
      <th>description</th>
     </tr>
  %{ROWS,<tr><td> %(id)% </td><td> %(name)% </td><td> %(description)% </td></tr>}%
    </table>
   </body>
  </html>

The %{}% syntax is called a "mapper", and is one of the main s/// (pronounced "suss") constructs in the WebApp template language. It instructs the suss engine to treat the variable named ROWS as some kind of reference, and to interpret the stuff after the comma appropriately depending on the context. In our case, ROWS is going to be an array reference, since it is the result of calling the DBI selectall_arrayref method. Each element in the array will be a hashref, with attributes named id, name and description as per the SQL select statement we used. These attributes will be interpolated into each successive iteration through the array reference, producing one row in the HTML table per row in the database table.

The code in this example contains no HTML: it processes inputs and produces outputs, which it provides to the suss engine by means of the set_vars method on the web transaction object.

The HTML in this example contains no code: it consists mainly of boilerplate, with a declarative description of how to turn the outputs the perl code produced into HTML.

No code in HTML. No HTML in code. Simple.

TMTOWTDI CONSIDERED HARMFUL

Speaking of mantras, there is a common Perl mantra that I have come to take issue with: There's More Than One Way To Do It (TMTOWTDI).

This philosophy can certainly be appealing, and quite often for good reason. It speaks to the generality and elegance of Perl as a programming language, and to the richness which its suite of constructs affords even the mildly clever programmer.

However, there is a time for elegance and generality, and there is a time for pragmatism. Although it might be possible to do the same thing in many ways, it might not always be desirable. Quite often, especially when considering mundane tasks, there is really just one good way to do something and a lot of not-so-good ways. Many web applications are nothing but the same thing over and over again, both at the macro and micro levels (as indeed are most kinds of applications). I speak not only of web forms, style-sheets and tables of data pulled from SQL queries, but also common constructs like session objects, user identities with persistent preferences, and database-backed classes that represent complex objects.

WebApp is an attempt at producing a framework which solves many common problems cleanly, not simply a toolkit to let you solve them yourself. By this I mean many specific things, but one thing in general: if there is some facet of web applications that is so common as to be considered nearly universal, then WebApp provides some built-in facility or function to support it. This built-in facility may not be suitable for 100% of the cases you would encounter, but it will be suitable for some very large percentage of the cases.

For instance, any WebApp-based application will have the following kinds of things in it by default:

  • a DBI-style database handle (maintained persistently across connections if run under SpeedyCGI)
  • sessions (the WebApp::Session class), backed by the database, consisting of arbitrary attribute=value pairs, and maintained by means of either HTTP cookies or a variable in the URLs presented to the application
  • a notion of user identity (the WebApp::User class), which Session objects rely on to identify ownership of sessions, and which can be used to store some subset of persistent session data across logins
  • Authenticated and unauthenticated modes of use, with a flag present in the session object that can be used to indicate whether or not a transaction came from an authenticated user
  • email-based registration functionality for new users of the application
  • Database-backed application objects, the code for which can be automatically generated from nothing more than the DDL (sans any additional behaviour, for which there is a standard mechanism the programmer can use)
  • A configuration file, which can be used to control a great many of the above facilities, and which can also be used by application code to specify arbitrary application-level configuration settings
  • Debug log maintenance, including primitives to allow for logging at various levels of verbosity, etc.

These facilities are not intended to be as elegant and completely general-purpose as is the case with the base facilities provided by other frameworks. In fact, much of what comes with the template application in WebApp is fairly far up the abstraction ladder, so to speak. The point is not to be all things to all people: there is generally one way to do things, but it is often a good way, if not the best way.

SECURITY BY DEFAULT IS BETTER THAN THE ALTERNATIVE

If you don't want your databases to be filled with trash and your web site to have cross-site scripting vulnerabilities and other such issues, you have to be careful. This means checking every input you get as rigorously as you can, being careful to encode things properly before spitting them at a browser or processing them, and generally using good common sense.

Sometimes, however, we need dirty little jobs done, and they are often done by dirty little people who don't think so clearly, or who just aren't very careful, for whatever reason. In such cases, it's a good idea to have a safety net. WebApp provides such a thing in the form of Input Sanitization By Default. When your code is invoked, it is handed all of the inputs from the outside world pre-sanitized, according to some fairly strict rules. This works just how you'd want it to for the vast majority of real-world cases. For the few cases where this does not work, there are standard primitives for getting at the unsanitized inputs, and for spewing out unsanitized bits of text.

By forcing a different idiom to be used to get at unsanitized inputs, we at least make all of the sites in your program where such things occur visible; a simple invocation of grep will find all the places where the reget_param method is used instead of get_param. At the very least this makes it easier to hunt down the place that allowed some nasty bit of e.g. JavaScript through the front door.

However, it should be noted that there is really one and only one way to correctly sanitize inputs: check every input against the most restrictive possible set of allowable inputs and throw away or complain about any inputs that do not pass the test.

In order to do this properly, you need some notion of types. The Web, of course, is essentially typeless, so that means that any notion of type in this sense must be supplied by the application or application framework on the receiving end. The initial idea behind WebApp was to do just this by encoding this information in a declarative fashion and providing generic type-validation machinery in WebApp. Although a laudable goal, this proved too cumbersome to implement and use, so I backed off and took a more pragmatic approach. Given my experience using WebApp so far over the last 3 years, it turns out that this was probably a good idea: the rest of the system is solid, and work on such a new mechanism could now take place without worrying about a lot of annoying little details that I have since worked out. A future release will provide some form of type-based input sanitization.

I make this point to give myself a convenient place to make the following statement: your application code should always validate its inputs using the strictest possible test, regardless of what WebApp (or any other framework) may or may not do for you. Security by default is not meant to free the programmer from their responsibilities in this regard, only to mitigate the risks involved when they fail to live up to them.

OPERATIONAL ISSUES CAN KILL YOU

Operational issues also cause avoidable headaches. For instance, a new version of HTML::Mason comes out. Can you use it in your current Apache/mod_perl environment? Need to re-build without Expat? Any CPAN deps or version skew that will screw you? Or how about that PHP upgrade: did it hang your ancient SquirrelMail setup out to dry, forcing an all-nighter on your operations staff?

This kind of stuff is not sexy. It's not intellectually interesting. It's not even mildly exciting, unless you find the idea of your users getting pissed off because their "mail is broken" to be heart-pumping good fun.

However, it is real. This is the crap that you have to deal with after all the "hard" problems are supposedly "solved." Personally, this stuff drives me nuts, but it's too important to ignore. These are the reasons why systems don't get upgraded, why so many corporate intranets have crufty old machines on them that only run one ancient app. Bad news, all around.

Although I cannot solve the above problems for you easily (although I'll consider it for a fee), I can at least not make the problem worse in my own little world.

First off, every WebApp-based web application keeps all of its code in a directory outside the docroot (or should), which is maintained by the webapp Install command (via the makefiles that come with the template application). Not only does the application-specific code get installed here, but a copy of WebApp itself gets installed there when you install your own code. This is the copy of WebApp that is used by your application, not the one installed in the standard place, i.e. /usr/local.

This means that new WebApp releases can come and go. You can install them to your heart's content without torching any existing apps. You can patch a specific bug in an old, installed application without bothering any other applications, even if the bug was in an old version of WebApp itself (naturally, since the current version of WebApp never has any bugs).

Although this does not get rid of dependency bugs, it at least means that you have some kind of leverage when dealing with WebApp-based web applications. A server that hosts a dozen WebApp-based apps can be upgraded an app at a time; old, crufty apps that have issues can be left alone, but you can still take advantage of new WebApp features with new apps, and we try hard to give you clear migration paths and make these issues explicitly known with every release.

Other details about the way WebApp works are also designed to ease operational concerns. There are built-in features, like timings for various phases of a page load that can be turned on with a single line in your apps config file. Verbosity levels are explicitly supported by the logging code, which is often a life-saver when trying to find a bug quickly in a system that's already gone into production. In general, WebApp tries to be pragmatic, not only from the programmer's point of view, but from the poor slob who's stuck making this stuff go, too.

CONCLUSION

There is nothing really Earth-shaking about anything I've said in this document. None the less, it is surprising how often one finds web applications exposed to the Internet that exhibit common problems, and which undoubtedly cause headaches for their maintainers, which could all be avoided by taking these things to heart.

Some newer frameworks and web application methodologies do take some of these simple principles into account. To a certain extent, WebApp suffers from the reticence of its author to Just Put It Out There. I'm probably too late to have much an effect on the world, but perhaps my implementation of these ideas will be useful to others anyway.

AUTHOR

Sean Levy <mailto:snl@cluefactory.com>

COPYRIGHT AND LICENSE

(C) 2002-2006 by Sean Levy <mailto:snl@cluefactory.com>. all rights reserved.

This code is released under a BSD license. Please see the LICENSE file that came with the source distribution or visit http://cluefactory.com/oss/WebApp/license.html