nostr protocol inherently gives away npub ownership at minimum in frequency in requests - fixing this would require caches and proxies to distribute and obfuscate request origins
you only need to hold a short window of time in the cache to do this matching, sufficient records to perform a reliable intersection, and then you can throw away the records... so, to do it in practise just requires a database with access times stamped in and a garbage collector
and yeah, the data size... you'd want to devise a compact storage and working memory architecture for this, but it isn't nearly as much resources as you'd think since you can almost certainly get away with a 32 bit serial for each npub and you can use a caching structure where you prune out extraneous data below the median threshold or so, this would keep only the high confidence data there