A big challenge of client performance is that you cannot optimize filters. There are a few objective things you can do, but without knowing the relay's query planner you don't know if it's better to make one request or two requests, for example. How you structure the filter could result in completely different performance on different relays. None are perfectly optimized for all shapes of filters. Extending filters to do more things might exacerbate this issue rather than help it. Regardless, this is an inherent limitation with doing JIT querying of relays. You are basically deciding that you prefer the freedom and purity of the protocol over UX. And that's fine.
Yep... And this is one of the reasons people tend to use a large number of relays. The reply times are quite "random" from the set of relays they have. Adding more relays just makes things work out of luck.
I think I saw an attempt at trying to optimize filters in the NIP PRs, but I don't know how far it went.