relay a has some set of notes relay b had another set of notes How do you sync these relays efficiently? Also from a client perspective: how do i query and download notes i don’t already have efficiently. No longer do we need to download tons of redundant data.
You wouldn't even bother. Random sample a smaller set and average for a quick answer that's good enough for the accuracy you want at the moment. If you want to know more exactly, take a larger data set and sample that based on how much time and resources you have. No?
not sure what you mean, usually you don’t want to make nostr queries probabilistic.
Why not. That’s all they will ever be anyway. Everything is an estimate except all the information itself, which you can never have all in one place simultaneously in this case anyway, due to the decentralization that is growing rapidly. Think on it or what just my 2 sats.
Woot. It's like rysnc for Nostr 🤣
That's what HORNET Storage Scionic Merkle DAGs are for. I'm building the first relay that will use it btw, we are aiming to complete it in 4-5 months (that's our milestone target and funding limit).
Have you read the negentropy v1 writeup? It’s elegant and already built into strfry.
It's not capable of guaranteeing complete synchronisation of things. I'm even working on (in my part time) a filter plugin that quickly recognises whether a reply post references a whitelisted identity, by building filters out of posts made by the user - for the purpose of caching first level replies to posts so the relay has it immediately. I'm going to be using bloom filters to build a recogniser, not sure how the state is going to be stored but probably replaceable private events for the filter itself as a virtual user.
just forgot my second point: compact, addressable merkle trees let you guarantee full set membership discovery, assuming that the user didn't post them to a relay not known by the one trying to find it. But it can know the ID, and it can know the set of a given batch of them that have been bundled into a tree.