Oddbean new post about | logout
 Then you know you can just add trackers to the many parts of the code in order to store who is creating Queues and how and when both Bob and Alice connect to the same Queue, map to their IPs and capture their rough locations. Over time the amount of data allows you to de-noise any IP re-use and specify which locations talk to which other locations. By roughly knowing locations over time, you can pin-point an exact user (for instance, if you roughly know where I work and roughly know where I live, you can easily filter your data set to get my exact queues). Knowing an exact user and some rough knowledge of my contacts (say through Nostr or other spaces), it allows you to start mapping out other users in the network. Soon enough, you can safely identify the owner of every single queue. 

It's work, but the metadata is there. 

Clients can make that work harder by using Tor and other tricks but most people are not doing that today. 

Now picture a similar server but without the creation of queues and each time Bob and Alice connect to send a message, they use a separate random key you have no knowledge about. You know IP X is receiving stuff using pubkey A, but the stuff is coming from everywhere. Everywhere can be another client or a relay that is just re-broadcasting the message, which can be old or new.  You don't know if it is the same conversation or not. You can track IPs of each sender and try to group messages by pairs of IPs, but when the IP changes, there is nothing you can use to link the two IP pairs used by the same person. I will look like a new user. Because you have less information, you will need to rely on more external data to identify each conversation thread, which creates more uncertainty on the outcomes. 

Now add Tor on top of that.