Oddbean

Seems relatively straightforward. Just need to make sure the number of relays queried doesn’t get too big. Will need to prune relays if other relays cover all pubkeys in each relays timeline query 🤔 Is this how your implementations work @PABLOF7z @hodlbod @Mike Dilger ? https://i.nostr.build/auNu78hGBpwKIXKi.jpg

For example, in this example, we wouldn’t bother querying R4 since it is covered by R3, and we wouldn’t bother querying R2 since it’s covered by R1? So you would only need 2 relay queries here

Basically, but doing only this results in only ever connecting to hubs if you have a cap on simultaneous connections. So I randomize to sometimes get fringe relays to rank higher. If you keep good stats you start to find out which relays are dead or don't have the data you want, which greatly reduces the number of possible relays you have to choose from. I haven't implemented the data tracking part of that though.

Pretty much. I start by saying we need N relays per person (e.g. N=2 for me, I don't like lots of traffic, but it used to default to 3). Then I iterate through each relay and count how many people it covers, and take the relay that covers the most people, decrementing by 1 the number of relays those people each still need (and remembering the assignment so we ask the relays only about the people we assigned to it). Then I do it again until some condition halts the process (ran out of relays, or made no further progress). If a relay connection fails or drops then I go back and pick more relays again to cover the people now needing 1 more. That might mean several new relays are required to replace. I expect that with negentropy I will change this algorithm to connect to all of a person's relays which support negentropy, but only up to N=2 of ones that don't.