#dev #CLN
I've spent the last few workdays completely reworking our onion message code. This was scattered in various places and I wanted to unify it, and also written several years ago and I'd forgotten how the protocol actually works!
onion messages are *double* encrypted; this is the main source of confusion! At the high layer, they're a series of nested encrypted calls ("onionmsg_tlv" in the BOLT 4 spec), so each recipient decrypts and hands it on: this is exactly the same as we use for payment information. But inside that is *another* encrypted blob (onionmsg_tlv.encrypted_recipient_data), which requires a tweak which was handed to you alongside the onion, for you to decrypt (into an "encrypted_data_tlv"). Inside that is all the information about where to send next, any restrictions, and allows you to calculate the *next* tweak to hand on (it can also override the next tweak).
The double encryption is necessary because there are *three* actors here: Alice wants Bob to send her a message, without revealing her identity. So she gives Bob a "blinded path" which goes via Charlie: this path contains Charlie's pubkey (where to start the path), a blinding tweak, and two encrypted blobs for Alice to put into each layer of the onion message. The first an encrypted blob which Charlie can read, which contains her pubkey so he knows where to send it next. The second is her own, and contains a secret specific to the purpose of this message, so Bob can't play games trying to use this blinded path for anything else ("hey, are you the same node as this previous payment?") or use a different blinded path for this purpose. She can also add dummy hops (we don't yet), which she will simply absorb, to obscure the path length from Bob. You can add padding to make the hops indistinguishable (we don't yet).
Bob puts the actual stuff he wants to send Alice into the final onion call (often including his own blinded reply path!), along with the encrypted blob.
Importantly, even if Bob were sending a message *not through a blinded path* he would use the same double-encrypted format: that's so Charlie can't tell whether a blinded path is being used or not, even though it's slightly less efficient. Crypto is cheap these days, too.
Now, if Alice gives Bob a blinded path to Charlie and Charlie is Bob's peer, he can simply send the onion and the first blinding tweak to Charlie. But if Alice needs to send the message via Dave to Charlie, she needs to prepend a step. That's not quite possible, naively, because blinding tweaks are generated *forwards*, and she needs Charlie to get the right blinding tweak from Dave, and Alice has no way of making that happen. So inside Dave's encrypted blob, she uses next_blinding_override to tell Dave to hand that blinding override to Charlie instead of the normal one. I just implemented this for Core Lightning (previously we would simply connect to the first node, which is privacy-compromising and should only be done as a last resort).
These blinded paths have some nice properties: you can't use part of them (you don't know the blinding factor except for the first one, so you can't start in the middle, and you can't replace any data), you need to use all of them. They can contain timelimits to avoid easy probing, too: a classic measure would be to see if the path fails when a given node is down, but that takes time. The spec insists all errors within the blinded path are the same, and originate from the entry: this loses some analytical power on failure, but makes probing harder. The entry point is supposed to add a random delay (we don't yet!). There may still be implementation differences, but they're hard for Bob to probe (and Alice doesn't need to, as she set up the path).