Andrus Development: Optimize your regular expressions

Thursday, October 25, 2007

Optimize your regular expressions

An interesting article by Tess: http://blogs.msdn.com/tess/archive/2007/10/25/net-hang-case-study-high-cpu-because-of-poorly-formatted-regular-expressions-identifying-tight-loops.aspx

She went into the issue that if you don't optimize your regular expressions, the run time can be increase exponentially as the size of the string increases.

The reason for this, is because the regular expression is a NFA (Non-Deterministic Finite Automata)

Her example:
given the string: helloworldhelloworld
using regex: ([a-z]+)*= versus [a-z]*

given the string: abcde
Regexp: (abe|abc)(dg|dc|de) versus ab(e|c)d(g|c|e)

The long version

The order of execution here will be something like

a matches a in abe
b matches b in abe
c doesn't match e in abe, backtrack to try with the next expression
c matches c in abc - done with first part
d matches d in dg
e doesn't match g in dg, backtrack to try with the next expression
e doesn't match c in dc, backtrack to try with the next expression
e matches e in de - done with the regular expression

She recommends to avoid/decrease

nested quantifiers (*, +, {m,n})

backtracks

ambiguous matches (i.e. in this case the number of combinations that cause a positive match for part of the string)

Andrus Development

About Me

Thursday, October 25, 2007

Optimize your regular expressions

No comments:

Total Pageviews