Idk how the architecture of Nostr works but maybe make the data retention a shared burden by tiered relays would work? Smallest, most local relays only retain the most local information and purge other information after a short time. Medium area relays retain information from various small local relays of a given area and purge the rest after a certain amount of time. And the bigger relays which retain all the information.
Is that how it already works? Would it make sense for it to work this way if it doesn't?