Oddbean new post about | logout
 hehe, sure. Have you run a server? I did. 

The protocol requests that servers MUST NOT do many things but that doesn't mean they CAN'T do them. If servers WANT to track, they CAN track. It's on you to TRUST the servers.  The fact that SimpleX servers don't have your profile information doesn't mean they don't know who you are. 

For instance, the protocol requests servers to not log client commands and transport connections and to NOT use a persistent DB in the production environment. That is just a request. There is nothing blocking server operators from doing so. 

Queue IDs are generated by the server. The protocol requests a strong pseudo number generation. But who knows if they are actually using one for your account. They can just generate sequential numbers to more easily identify you out there and you won't even notice. 

The default servers from SimpleX do not use Tor and do not create a new TCP/IP connection for each new Queue/contact, which is against they own procotol recommendations. Their terms describe that they don't associate IP with Queue IDs now, but it's clearly possible. With IPs, you can get rough locations with enough precision to try to identify you with the rest of the data set they have. 

And I am not saying anything new or hidden somewhere. SimpleX is very upfront about the need to trust the server you are using. 

Don't get me wrong. SimpleX is better than many other platforms for privacy and security, but it still assumes a trusted model with their servers. 
 I run a server, smp.hankhub.net, and have examined the protocol in great detail. 
 Then you know you can just add trackers to the many parts of the code in order to store who is creating Queues and how and when both Bob and Alice connect to the same Queue, map to their IPs and capture their rough locations. Over time the amount of data allows you to de-noise any IP re-use and specify which locations talk to which other locations. By roughly knowing locations over time, you can pin-point an exact user (for instance, if you roughly know where I work and roughly know where I live, you can easily filter your data set to get my exact queues). Knowing an exact user and some rough knowledge of my contacts (say through Nostr or other spaces), it allows you to start mapping out other users in the network. Soon enough, you can safely identify the owner of every single queue. 

It's work, but the metadata is there. 

Clients can make that work harder by using Tor and other tricks but most people are not doing that today. 

Now picture a similar server but without the creation of queues and each time Bob and Alice connect to send a message, they use a separate random key you have no knowledge about. You know IP X is receiving stuff using pubkey A, but the stuff is coming from everywhere. Everywhere can be another client or a relay that is just re-broadcasting the message, which can be old or new.  You don't know if it is the same conversation or not. You can track IPs of each sender and try to group messages by pairs of IPs, but when the IP changes, there is nothing you can use to link the two IP pairs used by the same person. I will look like a new user. Because you have less information, you will need to rely on more external data to identify each conversation thread, which creates more uncertainty on the outcomes. 

Now add Tor on top of that. 
 I wouldn't have the slightest idea which queues are Alice and Bob.

I can log IP addresses for what it's worth but I think hiding IP is a job for a different layer.

I also can't see groups or group membership as groups only exist on the client. 
 nostr:nevent1qqsvd8d2ssygp6yrn5fe3hln4pp9puefhukue8xvxtmg3een599v3ucpp4mhxue69uhkummn9ekx7mqzyprmuze238a25e4u2l6uv7fqxjrd53txq22uk0dnctec7jlgmzqkuqcyqqqqqqglgqpm3

nostr:nevent1qqsvd8d2ssygp6yrn5fe3hln4pp9puefhukue8xvxtmg3een599v3ucpp4mhxue69uhkummn9ekx7mqzyprmuze238a25e4u2l6uv7fqxjrd53txq22uk0dnctec7jlgmzqkuqcyqqqqqqglgqpm3 

lead to the same content.. How? Is there some kind of error correction?? #asknostr 
 What client? 
 Amethyst 
 I have posted 2 different texts right? both nevent id's have somewhere 4 q's in the string but they are not the same. But it does lead to the same note... 
 OK, I have checked now. They are both exactly the same references. 
 Do you also see that in the post itself? Like, I was pretty sure that the first reference is a different one than the second one... Both contained 4 q's but one ended in a "d" or something, and the other in a "h" 
  @Vitor Pamplona Thanks for the suggestions. We did indeed consider various ideas about how to reduce the persistence queue ID, but the to rotate the queues to another server periodically seemed simpler and providing better metadata protection. The current approach also allows some basic protection from resource exhaustion attacks.

What I don't quite follow - if the same key A is used to retrieve the messages how it is any better than having queue ID from the perspective of correlating messages to the users? Wouldn't it still identify the user any better, and now instead of mapping IP addresses to queues, the server could map IP addresses to the messages... Maybe I don't understand what you propose?

The ideas we considered were about trying to avoid any persistent IDs entirely, e.g. via some kind of DHT tables, and something like this might happen in "v3" of the protocols (we are currently still moving to "v2"). 
 The difference is that each client can create as many rotating keys to receive the message as needed, including a new one per message if it is really necessary. These are created by the client, the server doesnt know about them.

But the main component is that you can encrypt the already encrypted message to a separate pubkey as many times as you need. You can basically create an onion route using multiple queue IDs where every server only knows where the which node is coming from and going to but not the author and the receiver. Messages can use complely separate routes, making it impossible to track a conversation sequence. 

On top of that, dates and times are random. Servers don't know which messages are new and which are not. Many payloads might contain the same messages, adding noise to the network. 

All of that is done with a single message type: A GiftWrap event.  
 > What I don't quite follow - if the same key A is used to retrieve the messages how it is any better than having queue ID from the perspective of correlating messages to the users?

You need to collect metadata over time to get some value out of it. SimpleX durable queue IDs help with that. His scheme somehow only has ephemeral per-message keys where I don't know where they come from. He may be alluding to https://github.com/nostr-protocol/nips/blob/master/44.md .

I suspect the idea is to post DMs to relays without a specific recipient coded into the message. A large number of *potential* recipients all download all DMs from the sender and attempt to decrypt them. This will succeed for those that were intended for them. The security comes from a large number of potential recipients connecting with their IP for download out of which the actual recipient is but one.

I doubt this will scale well, in line with nostr's overall hail mary approach to distributed systems architecture. 
 Would be good to look at the spec. 
 Brace yourself for the crypto heresy of using the same EC keys for encryption and digital signatures. 😋 
 👀

nostr:nevent1qqs2ql82xu2ha60s0pmksa0q2al20chj7d8dera66tjtg7v3ezln8ncpzpmhxue69uhkummnw3ezumt0d5hsygzxpsj7dqha57pjk5k37gkn6g4nzakewtmqmnwryyhd3jfwlpgxtspsgqqqqqqs72kt46