URI parsing using Bash built-in features

A bit of background

A while ago I posted an article describing how one could parse complete URIs in Bash using the sed program. Since then, I have realized that there is a better way to do it, a much better way: via Bash built-in pattern matching! 
Here are some benefits of this improvement:

  • It no longer executes external programs (i.e. sed) for pattern matching. This translates to higher speed and lower memory and CPU usages, which means that you could use this parser for much more intense URI crunching.
  • The new regular expressions are drastically simplified thanks to the ${BASH_REMATCH[*]} array that is able to hold more than 9 matched sub-expressions, unlike sed that can only work with single-digit escapes: \1-\9 (yuck!).
  • The parsing algorithm is contained in a single Bash function, so no external file is needed to hold the regular expressions. This also means, obviously, that the pattern file is no longer loaded from disk on every execution (so HDD is saved as well).
  • The generated variables are named identically to the first version, so you should be able to upgrade your scripts to this version with absolutely minimal effort.
  • [Edit]
    No eval instruction is needed (unlike in the first version), further improving performance.

Read more…

Bash URI parser using SED

Warning! This version is now obsolete!
Check out the new and improved version (using only Bash built-ins) here!

Here is a command-line (bash) script that uses sed to split the segments of an URI into usable variables. It also validates the given URI since malformed strings produce the text “ERROR” which can be handled accordingly:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Assembling a sample URI (including an injection attack)
uri_1='http://user:pass@www.example.com:19741/dir1/dir2/file.php'
uri_2='?param=some_value&array[0]=123&param2=\`cat /etc/passwd\`'
uri_3='#bottom-left'
uri="$uri_1$uri_2$uri_3"

# Parse URI
op=`echo "$uri" | sed -nrf "uri.sed"`

# Handle invalid URI
[[ $op == 'ERROR' ]] && { echo "Invalid URI!"; exit 1; }

# Execute assignments
eval "$op"

# ...work with URI components...

Notice the "uri.sed" file given to sed?
Read more…