Oddbean new post about | logout
 Nice! Do you know what's the difference between this and minisketch 
 No, I'll try to take a look 
 From my understanding (I can worng), range-based set reconciliation (negenteopy) are good with small set, while BCH-based (minisketch) works better with large set and provide error correction.

Maybe @Doug Hoyte can correct me or add some more info. 
 I only have a very rough understanding of how minisketch works, but from what I know it is the opposite.

Minisketch is very good with small set sizes -- so good that it approaches the information-theoretic bounds of what is possible, bandwidth-wise. However, look at the first graph on the minisketch site: https://raw.githubusercontent.com/sipa/minisketch/master/doc/plot_capacity.png

Here you can see that the CPU time required is exponential in its "capacity" parameter (note the Y axis is log-scale), and will become impractical in the low thousands. The text hints at this too: "For the largest sizes currently of interest to the authors, such as a set of capacity 4096 with 1024 differences".

OTOH, negentropy works predictably well into the millions or tens of millions of events. By pre-computing fingerprints as described in Aljoscha Meyer's paper, this will scale to arbitrarily large sizes.

Some other downsides that I see with minisketch are that you need to choose an appropriate capacity before you begin the reconcilliation, otherwise it will fail (?). How do you choose that? Also, minisketch's performance depends on the field-size, which I believe is basically how large the IDs being reconciled can be. The largest example shown is 64 bits, which is uncomfortably small if trying to avoid collisions. negentropy by default uses 128 bits, but can scale this arbitrarily high with only linear overhead in bandwidth and CPU.

By contrast, negentropy requires almost no tuning. The default parameters I have selected seem to work well for both small and large set sizes, and with small and large symmetric differences between the two sides.

I am in the process of adding negentropy support to gensync, which is a general benchmarking framework for set reconcilliation protocols. I'll post any updates about my progress here: https://github.com/nislab/gensync/issues/9