MRW

Parsing of QUERY_STRING in Bash CGI Scripts

Security Issue

Always quote all accesses to variables. Variables containing values such as *, ?, ../* could be expanded to file names relative to the script's position on the server. Always test your scripts with special values, such as *, ?, ../*, \, ", '.

Introduction: Basics

You can use bash for CGI-Scripts. POST-values can be read from /dev/stdin and GET-values from ${QUERY_STRING}. The problem is, how to parse POST and GET variables. A simple Bash CGI-script might e.g. look like this:

#! /bin/bash
 
/bin/cat <<EOF
Content-type: text/html
 
 
<html>
<head><title>Test</title></head>
<body>
<h1>Test for Bash CGI-Scripting</h1>
 
<h2>POST values from Standard-In</h2>
 
<pre>
$(</dev/stdin)
</pre>
 
<h2>Arguments</h2>
 
<p>Number: $#</p>
 
<pre>$*</pre>
 
<h2>Environment</h2>
 
<pre>$(env)</pre>
 
<h2>A Form that POSTs values</h2>
 
<p>
<form action="" method="post">
<input type="text" name="hallo" value="${hallo//\"/&quot;}" />
<button type="submit" value="Klick!">Guguseli</button>
</form>
</p>
 
</body>
</html
EOF

How to Parse Form Querys

Now the big Question is, how to evaluate the ${QUERY_STRING} variable or the $(</dev/stdin) POST data the most simple way? First of all, I found a very nice approach on http://oinkzwurgl.org/bash_cgi. Philippe's main idea was to parse the query variables and to set corresponding shell variables, e.q. if there is a query string test=hwllo+world, then we want a shell variable ${test} with value hello world.

There's a big danger: If you write in your script e.g. SHELL=/bin/bash and then use ${SHELL} file.sh, an evil attacker couls send you a query containing the variable SHELL=/bin/rm which would then remove your script file.sh. That's whyl I won't name the variables the same name as the key, but I prepend CGI_, so that key test (example above) results not in a shell variable ${test} but ${CGI_test} which is for CGI form query GET and POST variables only.

Important: Don't forget to quote your variables, e.g. never call htmlEnc "${CGI_myvar}" without quotes.

From this starting point, I redesigned Philippe's ideas, which results in the following small set of functions:

# Decodes an URL-string
# an URL encoding has "+" instead of spaces
# and no special characters but "%HEX"
function urlDec() {
  local value=${*//+/%20}                   # replace +-spaces by %20 (hex)
  for part in ${value//%/ \\x}; do          # split at % prepend \x for printf
    printf "%b%s" "${part:0:4}" "${part:4}" # output decoded char
  done
}
 
# For all given query strings
# parse them an set shell variables
function setQueryVars() {
  local vars=${*//\*/%2A}                      # escape * as %2A
  for var in ${vars//&/ }; do                  # split at &
    local value=$(urlDec "${var#*=}")          # decode value after =
    value=${value//\\/\\\\}                    # change \ to \\ for later
    eval "CGI_${var%=*}=\"${value//\"/\\\"}\"" # evaluate assignment
  done
}
 
# Execute the evaluation
# set all variables for both, POST and GET data
setQueryVars $QUERY_STRING $(</dev/stdin)

Plus the following optional auxiliary converters may be useful:

# Encodes an URL-string
# does the most simple encoding: everyting is encoded
function urlEnc() {
  for char in $(echo -n $* | xxd -c 1 -p -u); do 
    printf "%%%s" $char
  done
}
 
# HTML Encoding of a text
# quotes "&" to "&amp;", "<" to "&lt;", etc.
function htmlEnc() {
  local tmp=${*//&/&amp;}
  tmp=${tmp//</&lt;}
  tmp=${tmp//>/&gt;}
  echo ${tmp//\"/&quot;}
}

Complete Example

To complete the example above, the text from the form stored in ${CGI_hallo} is evaluated and printed:

#! /bin/bash
# Decodes an URL-string
# an URL encoding has "+" instead of spaces
# and no special characters but "%HEX"
function urlDec() {
  local value=${*//+/%20}                   # replace +-spaces by %20 (hex)
  for part in ${value//%/ \\x}; do          # split at % prepend \x for printf
    printf "%b%s" "${part:0:4}" "${part:4}" # output decoded char
  done
}
 
# For all given query strings
# parse them an set shell variables
function setQueryVars() {
  local vars=${*//\*/%2A}                      # escape * as %2A
  for var in ${vars//&/ }; do                  # split at &
    local value=$(urlDec "${var#*=}")          # decode value after =
    value=${value//\\/\\\\}                    # change \ to \\ for later
    eval "CGI_${var%=*}=\"${value//\"/\\\"}\"" # evaluate assignment
  done
}
 
# HTML Encoding of a text
# quotes "&" to "&amp;", "<" to "&lt;", etc.
function htmlEnc() {
  local tmp=${*//&/&amp;}
  tmp=${tmp//</&lt;}
  tmp=${tmp//>/&gt;}
  echo "${tmp//\"/&quot;}"
}
 
# Execute the evaluation
# set all variables for both, POST and GET data
setQueryVars $QUERY_STRING $(</dev/stdin)
 
/bin/cat <<EOF
Content-type: text/html
 
 
<html>
<head><title>Test</title></head>
<body>
<h1>Test for Bash CGI-Scripting</h1>
 
<h2>Result</h2>
 
<p>The variable \$CGI_hallo contains the value:</p>
 
<pre>$(htmlEnc "$CGI_hallo")</pre>
 
<pre>XXX=$(htmlEnc "${XXX}")</pre>
 
<h2>A Form that POSTs values</h2>
 
<p>
<form action="" method="post">
<input type="text" name="hallo" value="$(htmlEnc "$CGI_hallo")" />
<button type="submit" value="Klick!">Guguseli</button>
</form>
</p>
 
</body>
</html>
EOF
</pre>
 
<h2>A Form that POSTs values</h2>
 
<p>
<form action="" method="post">
<input type="text" name="hallo" value="$(htmlEnc "$CGI_hallo")" />
<button type="submit" value="Klick!">Guguseli</button>
</form>
</p>
 
</body>
</html>
EOF

Use it as Library

Create a file urlcoder.sh:

#! /bin/bash -E
 
# (internal) routine to store POST data
function cgi_get_POST_vars() {
    # check content type
    # FIXME: not sure if we could handle uploads with this..
    if [ "${CONTENT_TYPE}" != "application/x-www-form-urlencoded" ]; then
        return
    fi
    # save POST variables (only first time this is called)
    [ -z "$QUERY_STRING_POST" \
      -a "$REQUEST_METHOD" = "POST" -a ! -z "$CONTENT_LENGTH" ] && \
        read -n $CONTENT_LENGTH QUERY_STRING_POST
    return
}
 
# (internal) routine to decode urlencoded strings
function cgi_decodevar() {
    [ $# -ne 1 ] && return
    local v t h
    # replace all + with whitespace and append %%
    t="${1//+/ }%%"
    while [ ${#t} -gt 0 -a "${t}" != "%" ]; do
        v="${v}${t%%\%*}" # digest up to the first %
        t="${t#*%}"       # remove digested part
        # decode if there is anything to decode and if not at end of string
        if [ ${#t} -gt 0 -a "${t}" != "%" ]; then
            h=${t:0:2} # save first two chars
            t="${t:2}" # remove these
            v="${v}"`echo -e \\\\x${h}` # convert hex to special char
        fi
    done
    # return decoded string
    echo "${v}"
    return
}
 
# routine to get variables from http requests
# usage: cgi_getvars method varname1 [.. varnameN]
# method is either GET or POST or BOTH
# the magic varible name ALL gets everything
function cgi_getvars() {
    [ $# -lt 2 ] && return
    local q="" p k v s
    # get query
    case $1 in
        GET)
            [ ! -z "${QUERY_STRING}" ] && q="${QUERY_STRING}&"
            ;;
        POST)
            cgi_get_POST_vars
            [ ! -z "${QUERY_STRING_POST}" ] && q="${QUERY_STRING_POST}&"
            ;;
        BOTH)
            [ ! -z "${QUERY_STRING}" ] && q="${QUERY_STRING}&"
            cgi_get_POST_vars
            [ ! -z "${QUERY_STRING_POST}" ] && q="${q}${QUERY_STRING_POST}&"
            ;;
    esac
    shift
    s=" $* "
    # parse the query data
    while [ ! -z "$q" ]; do
        p="${q%%&*}"  # get first part of query string
        k="${p%%=*}"  # get the key (variable name) from it
        v="${p#*=}"   # get the value from it
        q="${q#$p&*}" # strip first part from query string
        # decode and evaluate var if requested
        [ "$1" = "ALL" -o "${s/ $k /}" != "$s" ] && \
            eval "$k=\"$(cgi_decodevar $v | sed 's/[\"\\]/\\&/g')\""
    done
    return
}

Then include it in your script:

. urlcoder.sh
cgi_getvars BOTH ALL

After this, you can access all parameter through local variables, i.e. if you pass http://my.domain.tld/script.sh?name=value you can then acces $name which contains the text value in your script.

For your security: Don't forget to filter the variable's content before doing bad things!