Case study: fixed number of iterations with LPeg

Words fail me to describe just how awesome LPeg is. Designed as a Lua implementation of the PEG concept, it is a true programming gem! Please, if you dont’t know what it is, take some time to familiarize yourself with it! It’s not the easiest thing to grasp, but you will *not* regret it! It is certainly one of the most worthwhile learning efforts you can make in generic programming.

One great feature of LPeg is that it’s binary-safe, meaning that (unlike regular expressions) it can be safely used to parse binary data! This makes it an excellent tool for parsing binary protocols, especially network communication protocols, such as the Action Message Format (used by Adobe Flash for making remote calls and even in FLV movie files). I’ll leave it to you to explore the possibilities…

Beware that from here on, I assume that you know your way around Lua, LPeg and how they work.

The problem

That being said, this article is actually about an unusual roadblock I hit while using LPeg to build a Lua-based AMF parser, and the various solutions I found and/or came up with to overcome it (you didn’t think that I mentioned AMF before by accident, did you?).

The issue is LPeg’s implementation of repetitive patterns: in particular, its inability to match (or capture) a fixed number of occurrences of a certain pattern, although it can match a minimum or a maximum number of such occurrences, which is perfect for stream-oriented parsing (such as parsing programming languages) but insufficient for binary data.

Just to clarify, here’s a small list of LPeg patterns which correspond to the typical PCRE repetitive constructs (in each case we’re trying to match the string ‘cloth’):

Nr. Matching occurrences of ‘cloth’ PCRE pattern LPeg pattern
1 0 or more (at least 0) /(cloth)*/ lpeg.P'cloth'^0
2 1 or more (at least 1) /(cloth)+/ lpeg.P'cloth'^1
3 X or more (at least X) /(cloth){X,}/ lpeg.P'cloth'^X
4 1 or less (at most 1) /(cloth)?/ lpeg.P'cloth'^-1
5 X or less (at most X) /(cloth){,X}/ lpeg.P'cloth'^-X
6 precisely X (no more, no less) /(cloth){X,X}/ -- not implemented --
7 anywhere between X and Y /(cloth){X,Y}/ -- not implemented --

For cases 6 and 7, LPeg does not offer any simple constructs so we have to find a complex one. But let’s put case 7 aside for a while, and try to tackle case 6, then we’ll see… Read more…

Lua2C, an updated version

I know I have been “missing in action” lately but I am working furiously, and I seem to have too little time for my blog (very sad face). But, just for a breath of fresh air, I thought I’d share something with the world.

Entering lua2c.lua

Lately I became quite interested in Lua (a lot actually). It has phenomenal speed, exceptional interfacing with C and some features and libraries that just make my day (i.e. coroutines, lpeg, lua-ev and others), and since I needed to embed some Lua scripts (entirely) in a C project I’m currently working on, I ended up adapting Mike Edgar’s “bin2c.lua” script (which takes a Lua script and turns it into a C header file) to suit my needs.

Basic functionality

Specifically, this adaptation generates a function that takes a Lua state as the only argument and then runs the embedded Lua code in the given state after which it returns the status (as opposed to putting the code straight in the top-level scope of the generated file). This makes it easier to embed code in C and then invoke it, and also to apply the same code onto multiple Lua states (e.g. multiple threads).

Check the end of the post for a usage sample. Read more…