Friday, June 23

Geek

Irregular Expressions

I hate regular expressions.

There's this thing I call information density*. Regular expressions are extremely information-dense. So are (for example) Forth and APL. With any of these, you can express a very complex algorithm in a very short sequence of symbols, but that comes with a cost.

People are used to dealing with information density within a certain range. For the most part, the information we receive has massive amounts of redundancy; you can often miss half the message and still understand it. Not so with regular expressions - every single bit matters. There are no (or almost no) cues to what is going on; you have to inspect each symbol one at a time, parse them into groups, interpret the groups, work out the relationships between the groups... And do it all correctly.

Computers are good at that. Humans not so much.

Well, computers are supposed to be good at it, anyway.

The subject arose because I needed a string-formatting language for the templating system in Minx. Python has fairly good formatting for numbers, dates, and times, but it has no equivalent formatting library for strings. It thinks it has several, but it doesn't. What it has is libraries that format things into strings, but nothing to format the strings themselves.

Except regexps.

So I used those. And the first example - not particularly complicated - sent the template engine into what appeared to be an infinite loop. Worked fine in the examples I tested. Worked fine for the first three items on the page. Raised an exception for the next item (quite validly). And then tried to process the item after that and was never heard from again.

I'm sure I could fix it, but it remains that it happened to me, and I wrote the blasted thing. If it happens to me, then a week after launching the software I'm going to find the server with a load average of 700 doing nothing but processing regular expressions for ever and ever.

So I wrote a little text-formatting library instead.

With plugin support.

Sixty lines of code, does almost everything I need.

* There's a specific term for it, but I can't recall it at the moment. But it relates to randomness and entropy.

Posted by: Pixy Misa at 08:03 AM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 380 words, total size 2 kb.

Comments are disabled. Post is locked.
45kb generated in CPU 0.0118, elapsed 0.1015 seconds.
54 queries taking 0.0929 seconds, 336 records returned.
Powered by Minx 1.1.6c-pink.