She went into the issue that if you don't optimize your regular expressions, the run time can be increase exponentially as the size of the string increases.
The reason for this, is because the regular expression is a NFA (Non-Deterministic Finite Automata)
Her example:
given the string: helloworldhelloworld
using regex: ([a-z]+)*= versus [a-z]*
given the string: abcde
Regexp: (abe|abc)(dg|dc|de) versus ab(e|c)d(g|c|e)
The long version
The order of execution here will be something like
a matches a in abe
b matches b in abe
c doesn't match e in abe, backtrack to try with the next expression
c matches c in abc - done with first part
d matches d in dg
e doesn't match g in dg, backtrack to try with the next expression
e doesn't match c in dc, backtrack to try with the next expression
e matches e in de - done with the regular expression
She recommends to avoid/decrease
- nested quantifiers (*, +, {m,n})
- backtracks
- ambiguous matches (i.e. in this case the number of combinations that cause a positive match for part of the string)
No comments:
Post a Comment