I think this is just a bloom filter. With only a few target words and a lot of potential ones, even with a multihash scheme you'll end up with a lot of false positives. For those you'll also need some way to reliably rule them out, and a radix tree is one implementation.