Oddbean new post about | logout
 What the metadata privacy in Matrix chat like? 
  @hodlbod and @jb55 were talking about this on a recent Thank God for Nostr podcast. 

My takeaway was that it's very good but has serious scaling limits because of this.  
 I'm fine with the scaling limits, I just don't want my rectum inverted everytime I write "Gm" 
 😆 A reasonable request. 
 GM

OH GOD. WHY DOES MT RECTUM FEEL THIS WAY?! 
 Why should your GM be private??? Share it with the world!!

https://media.tenor.com/lzvI32Y92ysAAAAM/good-morning.gif 
 I don't know firsthand, but I believe they encrypt each message for the recipient, hence the scaling issues. They've been working on adopting MLS for 3 years, which will help a lot when it gets done. 
 encrypted group messages are an unsolved problem as soon as the group becomes too large afaik 
 That's what MLS is for, it reduces the amount of messages needed from O(n) to O(log(n)). Which is nice, but still doesn't scale as well as O(1) which is what you get with shared keys (at the expense of much worse security). 
 the most important thing is that Matrix does not encrypt metadata 
 Matrix can be run Sovereignly. This helps as you are in full control of who you add into the rooms. Also, you can customize the server where no emails are used. It goes without saying don't dox yourself with a username. Also, if you federated with only other sovereign runners of their Matrix instance your attack vector is highly minimizesd. 
 It's ok for government, enterprise, etc., but not for casual users. 

nostr:nevent1qqsdy76ydvrej20uqm5j88m424pg4n0lu34vpdgfek42k7kyafuey6spz3mhxue69uhhyetvv9ujumn0wd68ytnzvupzq7wzets3f63g4xq7w4vmflnc2jj8x5s63532v6a6h8azfr4cyrlkqvzqqqqqqyhjv207 
 It will scale better when they finish integrating MLS (they've been working on it for 3 years, with the help of the German government). 
 much worse than on The Nostr. Homeservers know anything except the content of messages. 
 lol nostr is terrible for privacy, I hope you are aware of that. 
 Matrix is even worse cuz The Nostr uses giftwraps, metadata is mostly hidden from relays.  
 lol dude with all due respect but you sound like you don't know what you're talking about, if you do please talk clearly.

You don't give wrong privacy advice to people, that's irresponsible.  
 It's hard to compare the two because people usually oversee that Matrix Servers, SimpleX Servers, Signal Servers know who you are to perform the right access controls into your chat rooms. While no one else knows about your messages, the servers do have a LOT of leverage over their users. Any legal action can simply target the server operator, they can turn on several tracking mechanisms without your knowledge and then metadata privacy is pretty much gone.

The goal for the GiftWrap idea is to remove the need for Nostr relays to authenticate users into chatrooms. While everyone can now see GiftWraps being received, they still can't know anything else about it. And since the GiftWrap protocol uses multiple relays to pass messages around, it is extremely hard for any legal action against the server operator to break your privacy. 

Now, of course, that all depends on compliant client implementations of the GiftWrapped DMs. Using the same protocol, a client can take the DM experience to such a privacy level (e.g. creating a new Tor session at each message to avoid IP tracking, minimizing nostr filter correlations, etc), that it becomes certainly better than Matrix, Signal or SimpleX. Enforcement wouldn't even know what to target to get your metadata. 
 @simplex servers know nothing about their users or groups even, what have you been smoking

Stop spreading FUD.

nostr:note1h4tsakqnl6m0ldjmdf5etd0ml5zu2u25zw7vhxanneyquexqntvsxy6rsu 
 hehe, sure. Have you run a server? I did. 

The protocol requests that servers MUST NOT do many things but that doesn't mean they CAN'T do them. If servers WANT to track, they CAN track. It's on you to TRUST the servers.  The fact that SimpleX servers don't have your profile information doesn't mean they don't know who you are. 

For instance, the protocol requests servers to not log client commands and transport connections and to NOT use a persistent DB in the production environment. That is just a request. There is nothing blocking server operators from doing so. 

Queue IDs are generated by the server. The protocol requests a strong pseudo number generation. But who knows if they are actually using one for your account. They can just generate sequential numbers to more easily identify you out there and you won't even notice. 

The default servers from SimpleX do not use Tor and do not create a new TCP/IP connection for each new Queue/contact, which is against they own procotol recommendations. Their terms describe that they don't associate IP with Queue IDs now, but it's clearly possible. With IPs, you can get rough locations with enough precision to try to identify you with the rest of the data set they have. 

And I am not saying anything new or hidden somewhere. SimpleX is very upfront about the need to trust the server you are using. 

Don't get me wrong. SimpleX is better than many other platforms for privacy and security, but it still assumes a trusted model with their servers. 
 I run a server, smp.hankhub.net, and have examined the protocol in great detail. 
 Then you know you can just add trackers to the many parts of the code in order to store who is creating Queues and how and when both Bob and Alice connect to the same Queue, map to their IPs and capture their rough locations. Over time the amount of data allows you to de-noise any IP re-use and specify which locations talk to which other locations. By roughly knowing locations over time, you can pin-point an exact user (for instance, if you roughly know where I work and roughly know where I live, you can easily filter your data set to get my exact queues). Knowing an exact user and some rough knowledge of my contacts (say through Nostr or other spaces), it allows you to start mapping out other users in the network. Soon enough, you can safely identify the owner of every single queue. 

It's work, but the metadata is there. 

Clients can make that work harder by using Tor and other tricks but most people are not doing that today. 

Now picture a similar server but without the creation of queues and each time Bob and Alice connect to send a message, they use a separate random key you have no knowledge about. You know IP X is receiving stuff using pubkey A, but the stuff is coming from everywhere. Everywhere can be another client or a relay that is just re-broadcasting the message, which can be old or new.  You don't know if it is the same conversation or not. You can track IPs of each sender and try to group messages by pairs of IPs, but when the IP changes, there is nothing you can use to link the two IP pairs used by the same person. I will look like a new user. Because you have less information, you will need to rely on more external data to identify each conversation thread, which creates more uncertainty on the outcomes. 

Now add Tor on top of that. 
 I wouldn't have the slightest idea which queues are Alice and Bob.

I can log IP addresses for what it's worth but I think hiding IP is a job for a different layer.

I also can't see groups or group membership as groups only exist on the client. 
  @Vitor Pamplona Thanks for the suggestions. We did indeed consider various ideas about how to reduce the persistence queue ID, but the to rotate the queues to another server periodically seemed simpler and providing better metadata protection. The current approach also allows some basic protection from resource exhaustion attacks.

What I don't quite follow - if the same key A is used to retrieve the messages how it is any better than having queue ID from the perspective of correlating messages to the users? Wouldn't it still identify the user any better, and now instead of mapping IP addresses to queues, the server could map IP addresses to the messages... Maybe I don't understand what you propose?

The ideas we considered were about trying to avoid any persistent IDs entirely, e.g. via some kind of DHT tables, and something like this might happen in "v3" of the protocols (we are currently still moving to "v2"). 
 The difference is that each client can create as many rotating keys to receive the message as needed, including a new one per message if it is really necessary. These are created by the client, the server doesnt know about them.

But the main component is that you can encrypt the already encrypted message to a separate pubkey as many times as you need. You can basically create an onion route using multiple queue IDs where every server only knows where the which node is coming from and going to but not the author and the receiver. Messages can use complely separate routes, making it impossible to track a conversation sequence. 

On top of that, dates and times are random. Servers don't know which messages are new and which are not. Many payloads might contain the same messages, adding noise to the network. 

All of that is done with a single message type: A GiftWrap event.  
 > What I don't quite follow - if the same key A is used to retrieve the messages how it is any better than having queue ID from the perspective of correlating messages to the users?

You need to collect metadata over time to get some value out of it. SimpleX durable queue IDs help with that. His scheme somehow only has ephemeral per-message keys where I don't know where they come from. He may be alluding to https://github.com/nostr-protocol/nips/blob/master/44.md .

I suspect the idea is to post DMs to relays without a specific recipient coded into the message. A large number of *potential* recipients all download all DMs from the sender and attempt to decrypt them. This will succeed for those that were intended for them. The security comes from a large number of potential recipients connecting with their IP for download out of which the actual recipient is but one.

I doubt this will scale well, in line with nostr's overall hail mary approach to distributed systems architecture. 
 Would be good to look at the spec. 
 Brace yourself for the crypto heresy of using the same EC keys for encryption and digital signatures. 😋 
 I don’t think you can equate the amount of metadata available to Matrix and Signal servers with what is available to SimpleX servers - it’s actually less than what is available to Nostr relays users connect to. Putting them in one list, however flattering, implies a similar amount of metadata available, which is very far from reality and is misleading. 
 I agree, they are not the same. SimpleX is more private than Signal and Matrix for sure. But the protocol still grants a lot of info to the server. 

While Nostr relays also have a lot of info from users, especially if you mix private and public events, a fully private client can make it so that the relay doesn't even know if the computer connecting to them is a user or a proxy. The relay doesn't know if a DM is new or not because date/times are all random. In fact, many GiftWrapped DMs transfers are different encryptions of the same message (or any other private Nostr event), all from random accounts, being broadcasted by bots to generate noise. It fact, that same private client can just transfer directly in P2P if the two phones are online and relays won't even know about it (which is my main use right now) 
 I love this, thank you for giftwrap pilling me, I thought it's just a type of DHKE and symmetric encryption  
 I think GiftWraps are more like individual Tor messages that can include DMs inside them. 

They have the "next node" address that is visible (a pubkey which can be the real one, an alias or a new key every time), but everything else is either random or encrypted. 

The GiftWrap event can encrypt other GiftWrap events that together assemble be a full onion route with the benefit being that the final node is the client, not the relay. It would be like never hiting an exit node in Tor.

But we are on early days. I am still wanting for a real cryptographer to make sure our thinking doesnt have any holes in it.  
 Is there a fully private #nostr client, @Vitor Pamplona, as you talked about? 
 Not yet  
 Another curiosity, @Vitor Pamplona. Is the info Signal servers receive not encrypted? They say they store no metadata in their server. Is not that true in reality. Why is then Signal so much celebrated in the infosec community?

I'll be very obliged if you take your precious time to enlighten us. 

Thank you. 
 Technically nothing prevents them from storing metadata on servers - only contents of messages are encrypted, same with Matrix. So, you have to trust Signal servers.
They also know your phone number 😅. 
Signal is so much celebrated because it has great UX, large userbase, battle-tested encryption (WhatsApp is based on the Signal protocol) and it was the first. 



 
 Wow  
 Why don't they do this then?! 
 Yay @simplex