Clients can make the mistake of repeatedly trying to connect to a broken/down relay in a fairly tight loop. Clients can also make the opposite mistake of avoiding down relays for a significant time so that they don't get into a tight loop, and then due to a network glitch that makes all relays appear to be down, avoid ALL relays for a signficant time hamstringing the client. Gossip has suffered from both of these fates in the past. But I think the next release has it sussed, which means users don't have to care about whether these relays are down for a short time, or down for a long time, the client will do a reasonable thing. And that reasonable thing may even become more reasonable over time (e.g. we may add a capped exponential backoff) BTW: I have no plans to bring back wss://nostr.mikedilger.com/ and the replacement is at wss://chorus.mikedilger.com:444/ (the 444 is critical) nostr:nevent1qqsqf6nnphlgnarrj9qtrrfupfjvwj3lgh8h5frpdtkfdddj53u3y3qpz9mhxue69uhkummnw3ezuamfdejj74plq2y
There are good and long standing algorithms used in Ethernet - when more than one writer is trying to write to the bus at the same time a clash occurs. Both writers backoff for a small random time. The backoff increases in time after each successive clash (up to a limit). Maybe it’s the relay that can be responsible for having an optional “heartbeat”. It’s used all the time in device networks. Relay sends to other relays a signal when it comes back online, so a client sees when it’s active again when communicating with an up relay.