URI parsing using Bash built-in features

Filed under: Bash scripting, Command line tools

A bit of background

A while ago I posted an article describing how one could parse complete URIs in Bash using the sed program. Since then, I have realized that there is a better way to do it, a much better way: via Bash built-in pattern matching
Here are some benefits of this improvement:

  • It no longer executes external programs (i.e. sed) for pattern matching. This translates to higher speed and lower memory and CPU usages, which means that you could use this parser for much more intense URI crunching.
  • The new regular expressions are drastically simplified thanks to the ${BASH_REMATCH[*]} array that is able to hold more than 9 matched sub-expressions, unlike sed that can only work with single-digit escapes: \1-\9 (yuck!).
  • The parsing algorithm is contained in a single Bash function, so no external file is needed to hold the regular expressions. This also means, obviously, that the pattern file is no longer loaded from disk on every execution (so HDD is saved as well).
  • The generated variables are named identically to the first version, so you should be able to upgrade your scripts to this version with absolutely minimal effort.
  • [Edit]
    No eval instruction is needed (unlike in the first version), further improving performance.

Read the rest of this entry »

Posted on February 3rd, 2010 by Valeriu Paloş

4 Comments »

Bash URI parser using SED

Filed under: Command line tools, Networking

Warning! This version is now obsolete!
Check out the new and improved version (using only Bash built-ins) here!

Here is a command-line (bash) script that uses sed to split the segments of an URI into usable variables. It also validates the given URI since malformed strings produce the text “ERROR” which can be handled accordingly:

# Assembling a sample URI (including an injection attack)
uri_1='http://user:pass@www.example.com:19741/dir1/dir2/file.php'
uri_2='?param=some_value&array[0]=123&param2=\`cat /etc/passwd\`'
uri_3='#bottom-left'
uri="$uri_1$uri_2$uri_3"

# Parse URI
op=`echo "$uri" | sed -nrf "uri.sed"`

# Handle invalid URI
[[ $op == 'ERROR' ]] && { echo "Invalid URI!"; exit 1; }

# Execute assignments
eval "$op"

# ...work with URI components...

Notice the "uri.sed" file given to sed?
Read the rest of this entry »

Posted on November 16th, 2009 by Valeriu Paloş

8 Comments »

Copyright ©2012 Valeriu Paloş. All rights reserved. Design based on a template from Hive Designs ported by Theme Lab.