Oddbean new post about | logout

Notes by novalis | export

 @Julia Evans I definitely do not think of it as a NFA -- I think of it in terms of backtracking.  That happens to be how Python implements it (last time I checked).  I know that Rust and Golang use NFAs, but even there, it doesn't usually hurt to think about backtracking.
In practice, the NP thing is rarely an issue. I think I hit exponential behavior (not NP) once in the last 25 years, and I really like regexps.  Of course you have to worry about it for untrusted expressions, but that's a pretty rare case.
So, when I see something like "(ab)*abraca" matching ababababraca, I think, "so, we'll match some 'ab's, and then the last one we'll have to backtrack so we can get it to part of 'abraca'. (Actually, I think "let me rewrite that as (ab)+raca", but it's just an example).
 
 nostr:npub14gs36ydu3rkt8v902lef7wcm0t2cd9ge02xwvy34y3k8fg3ns3psle9ask not yet! i haven't really f... 
 @Julia Evans I have a suspicion: people see the symbol soup, and panic.  I remember you saying that you enjoyed my pcmpistri talk because it explained how it broke down into p cmp i str i and what those bits meant. I think a similar approach might work for regex.
I think people don't understand that a regular expression is a just something like a short program in a very weak programming language, and that they can understand it the same way the read other code: by breaking it down into parts and understanding each part.  The "be the computer" approach.  (It doesn't help that regex authors often don't break down their regex into parts, so the chunking is hard).
But I am someone who learned regular expressions in a theory of computation class, so I have a skewed view about what is easy and what is hard.