Hi all! I have just pushed a major update to the negentropy project. It implements protocol version 1 (previous was version 0).
The protocol has been simplified: ID size is no longer configurable, there are no more continuation ranges, and the output can be constructed in the same input-scan pass.
There is a new fingerprint function, based on an incremental hash. This allows fingerprints to be pre-computed and stored in a tree structure. The C++ implementation includes a B+Tree implementation that allows fingerprints to be computed without collecting IDs from the entire DB.
I have written a comprehensive article that goes over the theory of RBSR and the negentropy implementation:
https://logperiodic.com/rbsr.html
Comments are appreciated!
Finally, I have integrated the new version of negentropy and the B+Tree with strfry. It's in a development branch and not quite ready for production, but my testing indicates this will be a massive improvement for full-DB syncs, especially on relays with very large DBs. In-sync or nearly in-sync relays should sync almost instantly with negligible resource usage. Relays will also use the B+Tree for filters that contain only until and/or since, meaning that date-range full-DB syncs are also efficient. Syncing arbitrary filters works as before (but I have begun work to make these more efficient as well).
Unfortunately, this is a breaking change for the negentropy protocol (this really should be the last one!) and will also require a new DB version.
I'm going to take this opportunity to make a few more breaking DB changes, and plan to release strfry 1.0 after a beta testing period.