URI parsing using Bash built-in features

Filed under: Bash scripting, Command line tools

A bit of background

A while ago I posted an article describing how one could parse complete URIs in Bash using the sed program. Since then, I have realized that there is a better way to do it, a much better way: via Bash built-in pattern matching
Here are some benefits of this improvement:

  • It no longer executes external programs (i.e. sed) for pattern matching. This translates to higher speed and lower memory and CPU usages, which means that you could use this parser for much more intense URI crunching.
  • The new regular expressions are drastically simplified thanks to the ${BASH_REMATCH[*]} array that is able to hold more than 9 matched sub-expressions, unlike sed that can only work with single-digit escapes: \1-\9 (yuck!).
  • The parsing algorithm is contained in a single Bash function, so no external file is needed to hold the regular expressions. This also means, obviously, that the pattern file is no longer loaded from disk on every execution (so HDD is saved as well).
  • The generated variables are named identically to the first version, so you should be able to upgrade your scripts to this version with absolutely minimal effort.
  • [Edit]
    No eval instruction is needed (unlike in the first version), further improving performance.

Read the rest of this entry »

Posted on February 3rd, 2010 by Valeriu Paloş

4 Comments »

Copyright ©2012 Valeriu Paloş. All rights reserved. Design based on a template from Hive Designs ported by Theme Lab.