Docs » WebApp::Utils

Miscellaneous WebApp Utilities

NAME

WebApp::Utils - Miscellaneous WebApp Utilities

VERSION

  Time-stamp: <2007-06-14 13:58:46 mailto:snl@cluefactory.com>
  $Id: Utils.pm,v 1.55 2006/04/19 04:00:47 attila Exp $

SYNOPSIS

 use WebApp::Utils;
 # a collection of utility routines - see the docs, below

DESCRIPTION

A collection of utility routines useful for web apps, and used by most (all?) of the other WebApp modules and code.

OVERVIEW

This module has several kinds of code in it, none of which really belongs anywhere else.

We export the following tags, which you can use like this

  use WebApp::Utils qw/:tag/

We are a well-behaved, hygenic module, which means that by default we do not export anything into your namespace. You have to ask for it. So, go ahead, ask for it.

A good way of organizing the discussion of the routines in this module is by export tag, since I have gone to the trouble of categorizing things. Please refer to the documentation for specific subs for more information, this is just an overview to give you an idea of what is here.

  • * :db
  • Code that has to do with DBI databases. There are three main routines in this category: read_blob, write_blob and dbcroak. Read_blob and write_blob are just what you think they are, namely, routines to retrieve and store binary large objects (BLOBs) from and to a DBI database. The specifics of accomplishing this vary between underlying DBD drivers, so I have tried to abstract it a little bit. Currently, only PostgreSQL and Oracle are supported.

    Dbcroak is a wrapper around Carp::croak that is meant to be used when you are preparing or executing an SQL statement via DBI. It produces a reasonably informative message and a backtrace. Dbcroak uses dberr, which you can use in situations where croaking is not indicated.

  • * :hex
  • Routines having to do with hex-encoding and hex-decoding, namely, hex_encode and hex_decode. These are a little bit smarter than the ones in CGI. YMMV.

  • * :varfu
  • There are many places in WebApp where anonymous hashrefs are used to pass around data and perform transformations on strings. The routines exported by :varfu are load_vars, parsub and expand_range, all of which have to do with the handling of these hashrefs in one way or another.

  • * :web
  • This category overlaps with :hex - in addition to hex_encode and hex_decode, it also exports web_log, defang, defang_string and xtag, all of which are basic primitives used elsewhere in WebApp.

  • * :dates
  • What bag of tricks would be complete without date-manipulation fu? The few routines I have here are conveniences, some of which can probably be found in other CPAN modules, but I would like to limit my dependance on CPAN to modules that really are commonly available everywhere, not obscure Time/Date things. The routines exported are is_leapyear, month_days, get_monday, yyyyify, and mmify.

  • * :misc
  • Yes, a module full of miscellany is bound to have miscellaneous stuff. The routines exported by this tag are ts, exclude and copy_thing.

  • * :validate
  • Some validation functions, all of which return zero or undef if their input (always a scalar) passes muster as some specific kind of data, or the number of problems that are found with the input otherwise. The functions in this class are: validate_email, validate_int, validate_uint, validate_date, validate_intrange, validate_ip4address, validate_money and validate_float.

Of course, there is also the :all tag, which brings in all of the above. Finally, note that this module is constantly having things added to it. The documentation changes frequently.

DETAILED DOCUMENTATION

There are a couple undocumented features and/or args for some of these subs, which are that way because I reserve the right to screw with them. Only use the documented API, please, but feel free to peruse the code and complain about what you see.

  • web_log $level,$msg...
  • Log to stderr in a format designed to be consonant with whatever format apache is using for error_log; this is especially useful in CGI programs.

    Our first argument should be a number, which indicates the verbosity level of this message. We pay attention to the setting of $main::VERBOSITY, which, by convention, should be set to the maximum verbosity level desired in our logging output (the rest of WebApp conspires to get this variable set to something sensible by default). If $level is above our verbosity setting, the message is not produced. I wish there were a smarter compile-time way of making unuused web_log calls boil away, but I do not know of one. Some places, I use the idiom

      $main::VERBOSITY && web_log(3,"something...")
    
    

    so that the call boils away if $main::VERBOSITY is zero, at least. Since verbosity should probably zero when an app is in production, and non-zero when it is being debugged, I think this is sensible, but it is still ugly.

    To support the SpeedyCGI environment, where STDERR no longer ends up in Apache's error log, web_log will now notice if it is being run from a SpeedyCGI application (see CGI::SpeedyCGI for more details on SpeedyCGI). If it is, then the application may have a [SpeedyCGI] stanza in its configuration file that sets the following web_log-related parameters:

    • log_file
    • The name of the file to write our log output into; defaults to /tmp/app-name.log where app-name will be s///'ed with the name of the specific webapp (as given to the webapp Install command).

    • log_append
    • If true, we append to our log instead of overwriting it. This defaults to true. It might be useful to set this to false in conjunction with the log_per_pid option.

    • log_per_pid
    • If true, we append the process ID to the log file name we use; under SpeedyCGI, this will have the effect of creating one log file per SpeedyCGI back-end process.

    • log_autoflush
    • If true, we set our log handle to automatically flush all output. This can slow things down in a busy application that logs a lot of stuff, but can be essential for debugging. Defaults to true.

  • pidgin $string
  • Do pidgin HTML substitutions. Turns things like

            [[style]bold]some text[[!style]]
            [[link]href=http://some.where,external]text for link[[!link]]
    
    

    into <b>some text</b> <a href="http://some.where" target="_blank">text for link</a>

    To be used anywhere you want pidgin-enabled text, e.g. for places where you want users to be able to enter formatted text, but not real HTML (which should be everywhere, since you should never allow raw HTML input from users). The full set of tags is documented under WebApp::Documentation somewhere...

  • depidgin $string
  • Remove all pigdin pags from a string

  • entmap $char
  • Front-end to the %HTML::Entities::entity2char hash that always returns a string, if only an empty one.

  • defang_string $string[,options...]
  • Sanitize a string to various levels of paranoia/cleanliness. If direction is not given, it is assumed to be 'in', which does the more draconian stuff. Use dir => 'in' for handling input from the external world (e.g. web browsers), and direction => 'out' for producing e.g. HTML for consumption by the external world.

    The options that we take are:

    • dir => 'in'|'out'
    • Specify the direction, in for input from the external world, out for output to a browser. The default is in. The set of sanitization steps, and their order, is different depending on this parameter.

    • NONPRINTING => 0|1
    • If defined, controls the treatment of non-printing characters. If zero, non-printing characters are simply deleted, in both directions (in the out direction, newline and carriage returns are left undisturbed). The default is 0.

    • ENTITIES => 0|1
    • If defined, controls the treatment of HTML entities, e.g. the &foo; syntax. If zero, entities are deleted, otherwise they are passed through. In the out direction, if zero we turn all ampersands into the &amp; character entity, which will effectively literalize all character entities in the string. The default is 0.

    • HTML => 0|1
    • If defined, controls the treatment of HTML start and end tags. If zero, HTML tags are deleted, otherwise they are passed through, including their attributes, but not including any non-HTML text they might contain. The default is 0.

    • BREAKS => 0|1
    • In the out direction, turns newlines into \<BR\> tags, to preserve whatever layout was intended. On the in direction, preserves carriage returns and line feeds in the input (they are squished out by default).

      The default is 1 for out and 0 for in.

    • ANGLES => 0|1
    • In the in direction: if zero, squish out all angle brackets completely (default); if non-zero, turn angle brackets into their corresponding HTML character entities.

      In the out direction: if zero, turn angle brackets into their corresponding HTML character entities, otherwise pass them through unharmed.

    • DANGEROUS => 0|1
    • In the in direction: if zero, squish out backticks and vertical bars completely (default). No effect for the out direction.

  • defang [args...]
  • Defang is a wrapper around defang_string() that understands various kinds of arguments and just does the right thing. It understands the following arguments, as well as all of the optional arguments that defang_string takes:

    • input
    • If this is a scalar, it is defanged via defang_string. If this is an arrayref, an array is returned whose contents are the result of mapping defang_string across all elements of the input array. This behavior may be modified by the presence of the cgi argument.

    • cgi
    • If present, this should be a CGI object, or something that responds to the param method the way that a CGI object does. In this case, if the input parameter is not specified, then we return a hashref containing keys for all input parameters, whose values are the result of calling defang on their values; in otherwords, we defang the entire CGI input parameter list. If the input parameter is specified with a cgi parameter, it should be a scalar that names a particular param of the CGI; in this case, just that one parameter is defanged.

    • VECTORS
    • If non-zero, and we are invoked with the cgi parameter, then we always return vectors for the values of keys, even if the corresponding param only has a single value.

    Thus, defang can return a scalar, an array or a hashref, whichever makes the most sense. A couple examples might make it clearer:

      $string = defang(input => "an <b>input</b> string");
      # returns "an input string"
    
      @vec = defang(input => [ "string1", "string<script>two" ]);
      # returns an array: ("string1","stringtwo")
    
      @array = defang(cgi => $some_cgi_object, input => "names");
      # returns an array of defanged values for the "names" parameter
    
      $hashref = defang(cgi => $some_cgi_object);
      # returns a hashref with every input in the CGI object defanged;
      # some keys may have vector values and some may have scalars,
      # depending on how many values there were for each parameter
    
      $hashref = defang(cgi => $some_cgi_object, VECTORS => 1);
      # just like the previous example, except that all keys have
      # arrayrefs as values, regardless of how many values there were
    
    
  • ts [$fmt]
  • Return a timestamp string suitable for framing or using in logging output. Our optional argument can be a strftime-style format specifier; if none is given, the value of $main::TSTMAP_FMT is used, or the value of $WebApp::Utils::TSTAMP_FMT if there is none.

  • ordinal $number
  • Given an integer, returns a proper English ordinal number, e.g. 1 becomes "1st", 2 becomes "2nd", 3 becomes "3rd", ...

  • plural $number,$unit
  • Return a pluralized count, e.g.

      my $bananas = 1;
      print plural($bananas,"banana"),"\n";
      ## prints: 1 banana
      ++$bananas;
      print plural($bananas,"banana"),"\n";
      ## prints: 2 bananas
    
    
  • elapsed_string $seconds
  • Turn $seconds into a nice, readable string like

       10 hours 14 minutes 20 seconds
    
    
  • dberr $dbh,$stmt,$sql[,$params]
  • Return an error string describing the last database error.

  • dbcroak $stmt,$what,$sql,$args...
  • Wrapper around Carp::croak for reporting DBI errors. Our first argument, $stmt, can be undef, and probably should be if we are reporting an error from DBI::prepare. Otherwise, it should be a statement handle. Our second argument is just a descriptive string, but my convention is that it is either 'PREP' or 'EXEC', depending on whether the error occured during preparation or execution of an SQL statement. Our third argument should be the SQL itself as as tring, if you have it handy. Any other arguments are collected up and passed along to Carp::croak. Typically, you use it like this:

      my $sql = q{insert into table(f1,f2,f3) values(?,?,?)};
      my $stmt = $dbh->prepare($sql)
        or dbcroak(undef,'PREP',$sql,$f1,$f2,$f3);
      $stmt->execute($f1,$f2,$f3)
        or dbcroak($stmt,'EXEC',$sql,$f1,$f2,$f3);
    
    

    These idioms occur frequently throughout WebApp.

  • hex_encode $string [, lazy => 1 ]
  • Hex-encode string using HTML character entities. if we are told to be lazy, we leave whitespace characters (tab, space, carriage return, new line) along. otherwise, we encode all non-printing ASCII characters.

  • hex_decode $strings,[opts=>vals]
  • Takes the concatenation of its arguments and decodes any HTML character entities back into the characters they represent. leaves illegal entities alone. processes both decimal and hexadecimal entities.

  • load_vars $vars,$hashref
  • Both of our arguments are hashrefs. The contents of the second argument are merged into the contents of the first. This primitive should probably be called something like merge_hashrefs, but its not.

  • maplook $name[,$map]
  • expand_range_str $expanded_str
  • Given a string expanded by expand_range, turn it into a sequence of integers. If called in a scalar context, returns the number of elements in the sequence. For instance, doing

        $str = join(' and ', map { number_to_english($_) }
                    expand_range_str('1,3,5,7,9');
    
    

    maps the hypothetical function number_to_english over the first 5 odd positive integers. Note that expand_range will automatically invoke expand_range_str if called in an array context, so the above could also have been written.

        $str = join(' and ', map { number_to_english($_) }
                    (expand_range('1-9/2')));
    
    
    
  • expand_range $range_expr
  • Given a string that describes a range of integers, expand it into an explicit list of all integers in the range. Range strings can have commas and dashes in them. For instance,

      1-5,7,9,11-13
    
    

    expands into

      1,2,3,4,5,7,9,11,12,13
    
    

    and

      5-20/5,50
    
    

    expands into

      5,10,15,20,50
    
    
  • parsub2 $params,$content[,$no_defang[,$sub_map,[defang_arg1=>val,...]]
  • Suss things that look like

      %(foo)%
    
    

    into their values according to the $params hashref. If we cannot find the key foo in $params, but there is a __ key (two underscores), we treat it as a scoping link and call ourselves recursively with the value of __ for $params (so it had better be a hashref).

    Also, we understand how to invoke defang_string on all of the values we s/// into the result, and do so unless told not to by our third argument. If our third argument is zero, then we expect our 4th argument on to make sense as a hash, which we pass as arguments to defang_string along with each value in the hash that we s/// into $content.

  • pushvar $varhash,$name,$val,...
  • Given a vars-style hashref and a variable name, ensure that the variable is arrayref-valued, and push any additional arguments onto the arrayref.

  • read_blob $dbh,$blobid
  • Read a BLOB from a DBI database, however one does that, which depends on the specific driver. Knows about Oracle and PostgreSQL.

  • write_blob $dbh,$blob
  • Write a BLOB to a database, returning whatever that particular database uses as an identifier for BLOBs (which can be the BLOB itself, e.g. in Oracle).

  • get_last_oid $dbh
  • Portable wrapper to get the last OID (or whatever) from an INSERT or UPDATE

  • parse_db_ts $ts[,$dbh]
  • Given a timestamp string from some DBI database, parse it into a Unix time_t, obeying whatever conventions the particular DBD follows.

  • gen_db_ts $t[,$dbh]
  • Generate a timestamp string suitable for handing to SQL statements in a particular database. It is a shame DBI/DBD does not abstract that a bit better, because it can be a royal pain in the ass.

  • is_leapyear $year
  • Is the given year a leap year? We call yyyyify on our argument, so either 2-digit or 4-digit years are fine.

  • yyyyify $yy
  • Given a year, return a 4-digit year. If the year is already in four-digit form, we just return it untouched.

  • mmify $month_name
  • Given the name of a month, return its number. If we are given a month number, we return it untouched.

  • month_days $month
  • Given a month number or three-letter abbreviated name, and a year, return the number of days in that month in that year, adjusting for leap years.

  • get_monday $t
  • Given a time_t, return the Monday of that week. Should really generalize this for any week day some time, but in practice I have only ever needed Monday.

  • exclude $thing,...
  • Given a thing and an array, return a new array excluding the thing. We use eq for equality testing. This is not LISP. Get over it.

  • url2file $url[,$ext[,$dir]]
  • Given a URL (either as a string, or as a URI object), return an absolute pathname to a file in the local filesystem that corresponds to it. If given a second argument, it is used as the extension for the file. All of our exploration happens relative to our third argument, which should be the absoluate path to a directory; if it is not supplied, CODE_DIR is used instead.

  • register_filter $ext,$sub[,args...]
  • Associate a piece of code (and possibly some data, in the form of arguments), with a file extension. Used by load_maybe when refreshing file cache contents, to transform files at load-time.

  • load_maybe $filename[,ext=>$ext,subdir=>$subdir,force=>1|0,nocache=>1|0]
  • Load a file, possibly paying attention to a global cache of mtimes and skipping a load if the file has not been modified. If force is Non-zero, the file is loaded regardless; if nocache is non-zero, the cache is updated.

  • open_cached $filename
  • Return an object that implements an IO::Handle interface for the given filename. Uses load_maybe to get the file into core, and returns an IO::String object based on its contents, under normal circumstances.

  • dump_file_cache [$fh]
  • Dump human-readable file cache statistics to $fh, which defaults to STDERR

  • url_params $paramhash
  • pchomp $string
  • Psychochomp: really remove ALL leading and trailing whitespace.

  • xstring $foo
  • Try reasonably hard to turn $foo into a printable string that will make sense to a human.

  • load_test_data $class|$file
  • Used from tests in t/*.t - load a self-contained test data set of some sort.

  • query_app_name $cgi_query
  • Given a CGI query object, return a reasonable default app name. If we cannot figure it out, we return 'randomApp'.

  • set_header $hdrs_hash,$hdr_name[,$hdr_val[,$reset]]
  • Set a header in a hashref destined for a browser (HTTP headers).

  • cleanup_url $url[,$addon]
  • read_file $fh[,$tout]
  • Read everything from an open handle, but constrain ourselves to a certain time limit. XXX mod_perl issue with ALRM? I think so.

  • cidr_mask_addr $addr,$mask_bits[,$swab]
  • Given an IP address and the number of bits to mask, returns the address masked that way. If the address is a string, then we do whatever we need to do try and make it into an IP address by invoking the inet_aton function from the Socket package. This may end up invoking gethostbyname, which can be expensive, so make sure you know what you are calling us with if you care about such things.

  • cidr_matches $addr1,$addr2,$mask_bits
  • Returns true if the two IPv4 addresses in our first two arguments match each other when masked off by as many bits as specified in $mask_bits>. If $mask_bits is 32, then this is the same as comparing the two ip addresses for equality.

  • INPUT VALIDATION ROUTINES
  • All of the validate_XXX routines return the number of problems with their input string, or undef if there were no problems.

  • validate_email $email
  • Return undef if the email address passed looks good, otherwise return a string with a description of the problem(s) it has.

  • validate_password $string
  • validate_emails $string
  • validate_int $string
  • Validate our argument as having a valid integer (possibly signed)

  • validate_money $string
  • validate_float $string
  • validate_uint $string
  • Validate our argument as having a valid unsigned integer

  • validate_date $string
  • Validate our argument as having a valid date

  • validate_intrange $string
  • Make sure that $string is a valid integer range (see expand_range())

  • validate_ip4address $string
  • Make sure that $string is a valid IPv4 address.

TODO/BUGS

  • * The arguments to gen_form are horrific and odd
  • * The 'p' nonsense should be explained better, and cleaned up
  • * Come up with some consistent terminology for the $vars hashref
  • * Make validate_email() smarter or use some other thing

Other than that, a few things could be cleaned up. Would it kill you to pick up around here? Well, would it?

AUTHOR

  attila <mailto:attila@stalphonsos.com>

COPYRIGHT AND LICENSE