Oddbean new post about | logout
 Need to concurrently filter on conditions of 2 related event kinds (connected via "a" and "e" pointers) and there's no way of doing that.

There's not even a viable workaround, it's a fucking mess 
 You can’t open 2 separate connections to fetch the related events at the same time ? 
 Uhm, no. It's an AND that needs to happen server-side. It would be ridiculous otherwise 
 Care to share what you’re working on? 
 Given the folllowing structure: apps -> releases -> file metadatas

User searches are performed on apps, once an app is selected we fetch latest release, and from there grab file metadatas.

It's easy when the search is done on one event kind (apps) but now I need to also start filtering on file metadatas (for platform/os/architecture). You don't want to be presented search results for which there's no app for your operating system.

Theres no way of doing joins or more expressive querying in nostr - unless copying the data I need onto one single event kind, which is the workaround I did and I hate it. 
 Is it just that you want to search for "banana" and get back only the banana apps that have files available for your OS?

There is unlimited flexibility on the search handler side. NIP-50 even specifies a way to add custom filtering that you could use like "banana os:android-v15-x64". If the app-capable relays support this syntax they will take file metadata into consideration already when returning results. 
 From what I understood zap.store is already assuming there will be specialized relays for curated app directories, right? 
 Yes, you're right. I heavily rely on NIP-50. I just didn't want to keep going down the custom query syntax route to prevent too much specialization, so I opted for data duplication

But as you note below that might be the case anyway, having custom relays in these scenarios 
 I don't know if this is the solution, but the amount of query flexibility you would need in your case would be very very hard to achieve using native Nostr query language in a generic way. Even if you had one MongoDB with all the data I think you wouldn't be able to do the query you need. 
 App events should include OS tag. These complex queries seem to only be needed when data structure doesn't fit your use case 
 That's what I had to do but implies data duplication. The platform information (OS/architecture) is in the 1063 event where it should belong

Anyway I was bringing it up regardless of this particular case 
 That's what I am saying - the general solution is data duplication, not complex queries. We duplicate events across relays already, duplicating fields across related events makes as much sense. 
 I can partially agree, sometimes it's a good tradeoff.  Duplicating events is not at all the same as duplicating fields though 
 I agree with that too. 
 here are my suggestions for how search should work: 2 modes. strict string match: returns only exact string matches. vector search: return matches based on similarity. 
 Having to do multiple queries is also shit, and I will share my implementation to fix that. The problem is interoperability, relays are hard to change 
 I want to be sure I understand. Can you show it as an sql query? 
 its basically a join 
 Is that like a SQL JOIN?  
 exactly 
 I remember that being discussed years ago and rejected on the basis that it would have the implication of relays having to depend on a relational database.

Maybe that is something that has changed or could be challenged?

 
 That's a great question, I am not sure. Maybe there are other ways of doing it without joins. Anything in that direction would help, queries on nostr are very primitive - so clients implement workarounds that are wasteful anyway, like data duplication and lots of extra queries that add processing and bandwidth 
 I think Nostr queries may actually be too fancy. 
 Too fancy why?

Again my point about waste.

I think we could find the minimum common ground from a variety of query languages (sql, nosql, graphql), primitives that have emerged for most use cases. I'm not advocating for anything too sophisticated 

The relay I wrote runs SQLite on Bun.js, it's crazy fast to the point that calls are non-blocking and all that on a $20 VPS.

If slight more complexity makes it bad for huge relays, then that's also a good thing 
 I think if you can write the same functionality just using Redis or memcached that could be used as proof? 
 Well I couldn't think of something we could remove from the query language. I just think tag querying is too open-ended, like in theory you could query for a million tags (in practice it won't work).

I like the final point, but I think it makes it bad for implementations. I like the idea that anyone can make a relay implementation very easily, even if they create their own indexes manually. If we start depending on either SQL blackbox magic that would be bad.

Having a query language too flexible would also encourage bad behaviors in application developers, treating relays as full-fledged ultra-flexible databases, which they shouldn't be in my opinion.

I don't know. 
 I also don't know, but anyone making a relay implementation very easily = basically implies using an existing database (of which SQL is probably a majority).

The person who creates their own indices manually is an expert, and they probably should be using a proven product instead of fiddling around, no? 

Maybe we should define "ultra-flexible" but having a powerful querying language encourages bad behaviors in developers...why? what do you mean by that? 

Limited querying language is wasteful because we need to make multiple queries vs one, download tonnes of events when a few would suffice, make client side operations that should not happen, etc 
 How can you learn without fiddling around? 
 If you learn more than twice, you're playing with it. 
 Man, I'm all for fiddling around and learning. But should we limit nostr's capabilities because some people might want to fiddle building a relay without using a database? With all the stuff in the world you can fiddle with? I think we're trying to solve a different problem 
 When I said "bad behavior" what I had in mind were some people who were literally trying to sell Nostr relays as free databases for custom web apps some time ago, and then some recent attempts at using NIP-32 to store events with tags like L=name, l="something" L="author" l="someone" L="date" l="2024-07-20" and then do queries using that -- the way I've seen people do makes no sense, but you could have just different tags and values without the L/l overhead and then you would query all albums released in 1973 by bands with names starting in F that have less than 6 songs with {"#s": ["f"], "#y": ["1973"], "#n": [1,2,3,4,5]}.

You get the idea: people shouldn't be doing this! And if we make it easier they will do it more, and then it will be more hassle for relays to change themselves so they can block it.

Also powerful query language means it's easy for bad actors -- or just bad-coded well-intentioned clients -- to DoS relays. 
 I see the problem but don't agree with the solution.

It's like wanting to stay poor so if you get robbed, thieves won't get much money from you.

Bad actors will always find ways to DoS if they want to.

The free market will fix this problem. What if custom web apps want to pay money for these "databases"? And we're preventing relays from finding a monetization scheme

An advanced query NIP could be discussed and implemented by more specialized relays, that are either subsidized or can charge money for access 
 Well, link it here when you are ready to discuss the NIP  
 NIP-189 SQL queries

A new `sql` field is introduced for `REQ` messages from clients...

jk 😅 
 OK, I'm not opposed to having other way to query things -- but again if you're doing this for specialized use cases maybe it's better to have specialized queries that are specific to each use case. 
 Yea makes sense. I guess using NIP-50 for now 
 Nostr relays _are_ free databases for custom web apps. 
 it's a lot of processing and/or data storage to facilitate complex searching but i don't think filters are quite fully minimal either, to the point they require a lot of repeated results to be found and sent out and discarded as well

and the free part is just a factor of not building a monetisation model and CLIENTS NOT HAVING AUTH 
 How do you see auth working? You think all clients should just authenticate to all relays under every circumstance? Isn't that very bad for privacy? 
 how? every request has your pubkey in it too, not hard to narrow it down after a few messages, with a set intersection operation

the real privacy violation is in the IP address because that potentially gives your physical location, and then you prove you have the key apparently at that location by authing

so yes, you want to not auth to free/untrusted relays, but they still know your pubkey and can be pretty confident that at that location lives the nsec

so, if you care about location, you use a VPN or Tor

if you care about not giving away your identity, you uninstall the client and stop using it, you are going to identify yourself auth or not, this is an authenticated protocol

if you pay for the relay, and they are selling log data to third parties, you stop paying them and you stop using them altogether

if you pay a relay they have a much greater incentive to not betray your data to third parties and if they do, then they deserve to be blacklisted by everyone in the community for this

you can also run your own relays, because the protocol allows this kind of distributed access, and nip-65 facilitates this messaging pattern 
 I implemented something like this to try to calculate view count metrics — was surprised how easy and effective it was without auth and I shut it down shortly after because of the DB size and deleted all the data 
 nostr protocol inherently gives away npub ownership at minimum in frequency in requests - fixing this would require caches and proxies to distribute and obfuscate request origins

you only need to hold a short window of time in the cache to do this matching, sufficient records to perform a reliable intersection, and then you can throw away the records... so, to do it in practise just requires a database with access times stamped in and a garbage collector 
 and yeah, the data size... you'd want to devise a compact storage and working memory architecture for this, but it isn't nearly as much resources as you'd think since you can almost certainly get away with a 32 bit serial for each npub and you can use a caching structure where you prune out extraneous data below the median threshold or so, this would keep only the high confidence data there 
 Every request has your pubkey? Since when? 
 have you actually written client code for showing a user's view?

which list, for example, is most commonly requested by all clients in order to build a feed? oh yeah, the user themselves

what reason is there to request a follow list for others unless the client is reading their follow lists follow lists? that's about as deep as it's gonna go, but the client is going to ask for the users follow list every time, guaranteed

and that's just that one thing

there is other lists as well, all DM requests are going to include the client's npub, how many ways does this get used? it's basically the first thing a signer asks you permission to do and if you made it ask you every time you'd have to permit it for every action repeatedly 
 and virtually every feed request on a thread is gonna include the user's npub because they want to read their own posts in the thread

lol, i doubt the heuristics required to positively identify the npub used by a client not authing would barely fill a screen 
 Yeah this is very relevant, imagine. 
 You're not requesting the same things from all relays, because that obviously doesn't scale and outbox model is a thing. Also many clients will (and most should) bundle together requests for lists such that yours is mixed with others. DMs requests are (in sane situations) definitely not going to all relays, just to one or two, and they must use auth already, but just for DMs and relays that implement that and users that opt-in to NIP-17. 
 Anyway, can you tell me how do you envision this world of AUTH? Is it really that all clients will send AUTH always to all relays? What are the big advantages we get from that? 
 monetization of service provision

that is enough reason to make it universally supported

can't run this shit on donations for ever, unless you live in some la la land socialist theory of gift economy, like #v4v mostly sounds like most of the time 
 Someone wants others to read what they write, can't they pay for the servers instead of charging everybody who wants to read?

You know webpages are basically free to read and have been for decades, the publisher pays for the server. 
 But sure, there may be use cases in which charging for reads is necessary. It's not helping to get that point across to just yell about clients that don't implement AUTH, as if implementing AUTH fixed anything by itself. 
 helps users of paid relays, which i am one

so, full auth support makes it easier for me to do that... i still don't get full use of the filter.nostr.wine because i have to prod nostrudel to do it, it can do it, but it's still not following the protocol by doing it automatically for me

and there will never be private relay clusters for business use cases without auth on the clients and none of the funders seem to see it as a priority, thus the woeful state of it

yes, auth helps a lot of things... and privacy is one of them if the relay is trustworthy 
 auth will help relays avoid having to rate limit by IP address ( a terrible, horrible, and frankly useless method of fending off greedy connections).  instead you can use your npub with auth, and either stay within a regular client usage tier, or pay more to go insane with queries.

i do think most if not all relays will do this eventually, or theyll end up exactly like every website that blocks and captchas vpn connections. (even free relays) 
 fiatjaf hasn't even considered that clients could be configured to make a new key every auth request except for to paid relays also, defeating the privacy invasion angle completely and pointing back to the IP tracking problem 
 i just think he's in denial about the fact that relays are inherently trusted third parties, and reconciling that with the "free anti-censorship" features

a relay requiring auth to post is not censorship, and without auth you still get an IP address, and if that's a VPN address spam from such a vector will blanket block all use in this way

paying for use of a relay doesn't doxx you... that would require using a doxxable payment route and not using tor/vpn to access the relay

so, ip/npub as ways to decide what will be stored and relayed are both inevitable mechanisms, and being against censorship does not also mean being against paying for the goddang infrastructure lol 
 So don’t use nostr without a VPN… great 😔 
 You do not need a relational DB for joins