I don’t think it requires the cache being global, although that’s the approach primal went with. I’m imaging a second process running near your relay that’s responsible for projecting the current state eg revision of a note. this second layer could expose the same api as a standard nostr relay to make clients just work. you could also suck this second layer logic into your relay implementation depending on your scale/reliability/ops requirements.
in a large scale distributed system analogy, new events would hit a queue. one consumer group would write the raw events to storage. another consumer group would update the projected state in the second layer when applicable.
Yeah, I think I agree, the query interface just needs to be carefully designed so that multiple caches can be queried simultaneously and reconciled. See the discussion around COUNT.