Bash URI parser using SED
Filed under: Command line tools, Networking
Warning! This version is now obsolete!
Check out the new and improved version (using only Bash built-ins) here!
Here is a command-line (bash) script that uses sed to split the segments of an URI into usable variables. It also validates the given URI since malformed strings produce the text “ERROR” which can be handled accordingly:
# Assembling a sample URI (including an injection attack)
uri_1='http://user:pass@www.example.com:19741/dir1/dir2/file.php'
uri_2='?param=some_value&array[0]=123¶m2=\`cat /etc/passwd\`'
uri_3='#bottom-left'
uri="$uri_1$uri_2$uri_3"
# Parse URI
op=`echo "$uri" | sed -nrf "uri.sed"`
# Handle invalid URI
[[ $op == 'ERROR' ]] && { echo "Invalid URI!"; exit 1; }
# Execute assignments
eval "$op"
# ...work with URI components...
Notice the "uri.sed" file given to sed?
It is actually responsible for the URI parsing and it contains the required regular expression rules that will produce bash code out of the given URI which, in turn, when executed, will create our final variables to play with:
# initialize
s/[\r\n]+//g; s/`/%60/g; s/"/%22/g; T begin; :begin
# scheme, address, path, query, fragment
s/^(([a-z]+):\/\/)?(([^:\/]+(:[^@\/]*)?@)?[^:\/?]+(:[0-9]+)?)(\/[^?]*)?(\?[^#]*)?(#.*)?$/\
uri_scheme="\2"; uri_address="\3"; uri_path="\7"; uri_query="\8"; uri_fragment="\9"/i
T error
# user, pass, host, port
s/uri_address="(([a-z0-9_.+=-]+)(:([^@]*))?@)?([a-z0-9.-]*)(:([0-9]*))?"/\0; \
uri_user="\2"; uri_pass="\4"; uri_host="\5"; uri_port="\7"/i; T error
# path parts
h; s/.*uri_path="([^"]+)".*/uri_parts=(); \1/
s/\/+([^/]+)/uri_parts[$[${#uri_parts[*]}]]="\1"; /ig; x; G
# query args
h; s/.*uri_query="([^"]+)".*/uri_args=(); \1/
s/[?&]+([^= ]+)(=([^&]*))?/uri_args[$[${#uri_args[*]}]]="\1"; uri_arg_\1="\3"; /ig
x; G
# print
s/\n\ +//g; s/\n//g; p; q
# failure
:error; c ERROR
After the successful execution of this piece of code the following variables will exist in the running environment:
uri_scheme="http" uri_address="user:pass@www.example.com:19741" uri_user="user" uri_password="pass" uri_host="www.example.com" uri_port="19741" uri_path="/dir1/dir2/file.php" uri_parts[0]="dir1" uri_parts[1]="dir2" uri_parts[2]="file.php" uri_query="?param=some_value&array[0]=123¶m2=`cat /etc/passwd`" uri_args[0]="param" uri_args[1]="array[0]" uri_args[2]="param2" uri_arg_param="some_value" uri_arg_array[0]="123" uri_arg_param2="`cat /etc/passwd`" uri_fragment="#bottom-left"
You could play around with it a bit and tell me if you find any problems. Right now it is only a first effort but it could be improved. Cheers!
Posted on November 16th, 2009 by Valeriu Paloş
8 Comments
Pingback: valeriu.palos.ro » URI parsing using Bash built-in features