Oddbean

@Julia Evans I have a suspicion: people see the symbol soup, and panic. I remember you saying that you enjoyed my pcmpistri talk because it explained how it broke down into p cmp i str i and what those bits meant. I think a similar approach might work for regex. I think people don't understand that a regular expression is a just something like a short program in a very weak programming language, and that they can understand it the same way the read other code: by breaking it down into parts and understanding each part. The "be the computer" approach. (It doesn't help that regex authors often don't break down their regex into parts, so the chunking is hard). But I am someone who learned regular expressions in a theory of computation class, so I have a skewed view about what is easy and what is hard.

@novalis yeah I have some of the same thoughts since my thoughts about regexes were also formed in theory of computation class some things I find weird: - "a regex defines a regular language" doesn't exactly map to regexes in practice (the '?' feels a little slippery to me, and in fact I get a little confused about ?s in my regexes) - the thing where matching regexes with backreferences is NP-hard (I never use backreferences because they confuse me, but is it bad to use backreferences?)

@novalis also I have a pretty clear model of how to "be the computer" for a python program, but for a regex it's harder because actually to match an NFA you have to maintain multiple possibilities in your head at the same time or something? Maintaining 3 parallel possibilities for where in the state machine we could be actually doesn't feel that easy. running a DFA computer in your head is pretty easy but I don't think anyone is really converting a regex to a DFA in their head

@novalis (when I say "maintaining many parallel possibilites at a time", I'm thinking of this "Implementation: Simulating the NFA" section from https://swtch.com/~rsc/regexp/regexp1.html where he simulates an NFA using an array of states)

@Julia Evans I definitely do not think of it as a NFA -- I think of it in terms of backtracking. That happens to be how Python implements it (last time I checked). I know that Rust and Golang use NFAs, but even there, it doesn't usually hurt to think about backtracking. In practice, the NP thing is rarely an issue. I think I hit exponential behavior (not NP) once in the last 25 years, and I really like regexps. Of course you have to worry about it for untrusted expressions, but that's a pretty rare case. So, when I see something like "(ab)*abraca" matching ababababraca, I think, "so, we'll match some 'ab's, and then the last one we'll have to backtrack so we can get it to part of 'abraca'. (Actually, I think "let me rewrite that as (ab)+raca", but it's just an example).