Thursday, June 28, 2012

WebSpec - security

I recently installed a new version of WebSpec which fixes security holes and adds some new features.

Since the security fixes might be helpful for others supporting websites I will make a few comments about them. WebSpec operates mainly using Perl CGI scripts which accept information from the user entered in various boxes, process it to create an XSPEC script, run the script, and process the output to generate HTML for the next page. Any instance where information is accepted from the user then either passed to a program or echo'ed back in HTML is a potential serious security risk. The safest course is to check that all input contains only the characters which are expected.

Perl has a paranoid TAINT mode invoked using -T. In this mode Perl will generate an error if any user input is passed to a potentially dangerous command (eg system) without having first been run through a regex which limits the acceptable characters. To illustrate how this works lets look at some WebSpec code. A typical WebSpec Perl script reads from STDIN or the environment variable QUERY_STRING looking for units of form "name = value". Thus


if ($ENV{'REQUEST_METHOD'} eq "POST") {
read (STDIN, $_, $ENV{'CONTENT_LENGTH'});
} elsif ($ENV{'REQUEST_METHOD'} eq "GET") {
$_ = $ENV{'QUERY_STRING'};
}


s/^\s*//g; # remove leading whitespace
foreach $_ (split (/&/)) { # & delimits name/value pairs
s/\+/ /g; # .. which have '=' inside.
s/\%(..)/sprintf("%c",hex($1)-'0')/geo;
($name, $value) = split (/=/, $_, 2);
$entries{$name} .= $value;
}


This somewhat obscure piece of code puts all the information into an associate array called entries.

Now consider the variable "backsys" which is the systematic error on the background normalization. This must be a positive decimal number so we process it as follows:

($backsys = $entries{"backsys"}) =~ /([0-9\.]+)/;
$backsys = $1;

The term on the right hand side of the first line is a regex expression which says extract a string containing only the characters 0,1,2,..9 and "." and place the result in $1. The square brackets contain the list of allowed characters and the "+" indicates that multiple instances of each character are allowed.

A slightly more complicated case is that for a filename:

($rsp = $entries{"rsp"}) =~ /(\w{1}[\w\_\.]+)/;
$rsp = $1;

Here "\w" means any alphanumeric character so I allow a filename to consist of alphanumeric characters as well as "_" and ".". I also require the first character in the name to be alphanumeric. The importance of these restrictions is to eliminate the possibility of the user giving a filename starting with multiple "../" which means they could get anywhere in the directory tree.

Every member of the entries array is treated in a similar fashion. This is relatively little inconvenience and is worth doing even if you don't think any particular entry is a security threat. It may save you down the line when the script is changed to use the entry in a different fashion.

I'm not aware that other programming languages have anything exactly like the Perl TAINT mode however they can have features which help restrict what might be dangerous code eg Python has RExec.