Oddbean new post about | logout
 Hey devs, would love to do a quick poll. 

I've got a situation where I need to send a data object via Nostr events that, in a minority of cases, is too big to fit in a single nostr event (64kb limit).

I'm thinking of chunking the object across multiple events and then reconstituting it on the other side.  @arkinox and I were doing this when we were experimenting with embedded games last year and it worked surprisingly well. 

That said, it's very non-standard, so I'd love to hear some opinions. 

#askNostr #nostrDev 
 nostr:nprofile1qqs06gywary09qmcp2249ztwfq3ue8wxhl2yyp3c39thzp55plvj0sgprfmhxue69uhhg6r9vehhyetnwshxummnw3erztnrdakszxmhwden5te0w35x2cmfw3skgetv9ehx7um5wgcjucm0d5q3camnwvaz7tmgda68y6t8dp6xummh9ehx7um5wgcjucm0d5ahp0n0

Is this something that you solved for Akexandria with longer note-types for books? 
nostr:nevent1qqsxr42q4zgv0wxlzg9nezmygzktzw2l7v66aq450yedlesadazl6ygpz4mhxue69uhhyetvv9ujuerpd46hxtnfduhsygqh88vn0hyvp3ehp238tpvn3sgeufwyrakygxjaxnrd8pgruvfkaupsgqqqqqqsaq0vtq 
 What if you do that and make a standard? I mean under nip 
 It'll certainly be included in the NIP I'm working on if that's the way I go with it.  
 I don't see why Its non-standard, if you can create a new kind and make it a standard, especially if it's in the minority case. 

Only edge case I've been able to think about is what happens if there are two subsequent events claiming to be the successor of the first one, which is not going to happen in the usual case.  
 Each could contain a hash of the successor. 
 Yup, that's what I was planning too.  
 Then you just need a `head` tag, maybe, so that it builds faster. 
 💯 and a tag with the total number of events there are.  
 Is that a limit which is enforced by relays? 
 That's a good question. I believe it is and thought it was in the NIPs repo but now I can't find the mention of a limit there...  
 Maybe not official but wouldn’t be surprised if it was a filter implemented by relays as a form of spam protection. 
 I've heard strfry limits events to 64kb by default so it's basically a standard now 
 nostr.land does not :) 
 What's the limit there? 
 256kb 
 64kb would be enough for almost everything if nostr lists were crdt 
 Also, khatru sets it at 500kb 
 That's what Alexandria does with books, basically. 
 We define the linked list by using a replaceable index event, tho. 
 Here is what I used for crashglow.com https://github.com/arkin0x/nostr-chunkey-monkey

Is blossom an option Jeff? 
 I was surprised at how well chunkey-monkey worked. Retrieving and reassembling the events was very consistent and fast. 
 What’s the use case? 
 https://xyproblem.info 
 Always got to sneak in a dig eh? 😅 
 😂 
 MLS welcome messages for DMs and Private groups. 

Welcome messages only happen when you add a new person to a group and only have to be sent to the new member(s). For groups below ~150 (using Ed25519 or Schnorr keys) the welcome message content fits in a single event. For larger groups, it goes over the 64kb limit. And when you introduce quantum resistant keys, it happens much sooner. 

https://m.primal.net/LNpy.png  
 Partitioning is probably the best idea.

I’m thinking we need a binary version of Nostr events (not binary encoded events), for cases where it’s not complicated enough to involve a system like Blossom, but also would benefit from binary encoding like encrypted events. 
 Explain what you mean by binary version vs binary events. Not sure I’m following 
 Binary encoding: we keep compatibility with nostr events, just optimize JSON field names away

Binary events: breaking change in signing scheme that also means conversion to JSON is optional, tag values and event contents can now be binary 
 s/optional/not needed/ 
 We can have alternative signatures in a specific parameterized single letter tag

Labeling events could be used to augment missing tags after the fact

IMO a common precompression dictionary would be even better and simpler

The canonical form, also, is better, there really is no need to transmit the event id 
 why?

we can just add a new field like “binarySig”, relays can validate both of them (and backfill binary signatures if later added by the author), and clients can pick one 
 yeah, that's what i'm saying but just make it a tag so clients can generate it directly as well

i would prefer it if there was no new fields in events

as for filters i have said that "it would be useful to have a negation operator, eg !" 
 the problem is with a tag you can’t add the faster signature later 
 why not? tags are just lists of strings?... i can picture `["b","<hex of ID>","<encoding scheme name>","<signature>"]` in keeping with existing layouts 
 I was thinking you would add it to the original event 
 you can't do that... it's a chicken and egg problem

if you are going to make a canonical binary encoding, exactly the same as the json canonical you do not hash on the hash (obviously) or the signature (also obviously) but when i say obviously, you don't realise that until you think about it

the ID and signature are external to the event, even if they appear in the naive version sent over the wire, that is only there really for fast access, a convenience, but to make the hash again you remove the signature and ID from the event and remove the object keys and replace it with a strict faithful ordering as received otherwise

the same rules can be applied to a binary encoding, so a hypothetical "b" tag could be added with other variations but if the signature is missing that is not an available verification method, so fall back to the json canonical for that, as the signature field is also present there, and would be retained in a binary wire format for the same reason (legacy, perhaps, but probably it will never go away, and good i say)

so, yeah, in summary IDs and Signature fields are inherently external to the data, thus they can be added or removed with impunity and if we were to propose a NIP to use them, this makes it simpler to explain, "b tags must not be added in the canonical form to derive the json ID hash"

in this way the event can be sent on the wire with this binary signature on it, enabling the forming of the binary encoding corresponding to it later, but it has to be omitted

so, yeah, maybe that's too complicated for people's brains, but ultimately that's how it would have to work 
 also, just should point out, that the wire format could actually be the canonical encoding, it's not that wasteful of space really, already eliminates all the repetitive key strings and all 
 and a binary version of the canonical encoding also would leave out the event IDs and only retain the signatures

anyway, the bigger struggle with this is the morons who think shit like CBOR and MSGPACK and Protobuf etc are more important debate materials

they really aren't, and i think that there really is a lot of justification for the idea of just making one binary encoding canonical form that gives you a second signature field and done

any programming language doing binary RPC should be able to deal with it, i don't think it's even that hard in javascript to decode binary data and unpack it into JSON and import it 
 Apparently I'm one of those morons. Why wouldn't you want to use CBOR? 

Is your solution just to binary encode most of the json event to squish it down but then keep an id/sig (and maybe a tag) unencoded? I'm not sure I'm following what you're suggesting. There's too much vitriol mixed in with the ideas. 😅 
 I’ll DM you an explanation for my idea that will end up never finished soon. 😂 
 don't need the ID there at all in the wire format, and it's only needed as an index in the database

you would have vitriol too if you looked too close at the code to see how much the politics influences the architecture in a very negative way

also, you would also have vitriol if people respond by focusing on the person instead of the engineering, so, please, keep it on topic 
 and answering the technical part of it

- compressing hex is, you know, rather simple, 50% if you know the field is hex

- a fixed ordering of fields instead of field names like in the JSON eliminates the need for labeling, it's literally just a set of offsets and length prefixes

- where you know the field is going to be binary, you can encode it as binary, in the expected length, i have used this in my runtime and database formats in my code - and there is a lot of places with this, field 2 in e and p tags, the decimal:pubkeyhex:string of a tags, the ID (which can be omitted) and the pubkeys and signatures all can be sent over the wire as binary, without any length prefix if they obey the ordering

bringing your opinions about programming languages and this RFC and that RFC is stupid we have options here we don't need to hitch our horse to some anonymous committeit over in Buffalo or Antwerp we are building our own protocol and we can easily come to agreement on the performance characteristics and limitations of implementations thanks to our diversity

fiatjaf was right in the way he designed the scheme pretty much, except on a few small points like why send the event in object format when you could have just sent the canonical form that means you literally can hash it on the spot and get the ID? verify the signature and two steps just got eliminated 
 i kindly invite you to go inspect the protocol specification of Bluesky and come back to me after you have digested the abomination for your opinions about teh use of CBOR 
 yes, I’m also saying the new binary signature is another detached field like id/sig 
 but how to preserve it in the json object so the binary version can be derived/verified from it? easier to stick with existing fields but define a strict meaning to one that is then essentially exempted from the indexing rule, this way clients won't break decoding it even if they don't use it

adding more keys to the object is going to break them, and you do want to keep these extra fields (sans ID) in other encodings 
 yes, my suggestion would be then, that you put the tags so it's `["b","<signature hex>","<encoder type>"] and the rest is implied by the canonical encoding rule

we don't have to have the ID in the wire format, period, so it doesn't have to be in the tag either, actually 
 i personally am very much in favour of making a gradual move towards this by suggesting that relays and clients with some nip-11 flag understand events that are in canonical format, and then you can add a futher tag with a list of supported encoders and the in-event signature for other formats, which are optional, only the json form is mandatory, can have signatures for your faster verification form the binary form. done. 
 also just need to point out - the signer of an event could make an after-the-fact amendment to add the tag  with another tag that refers to the replaced version that gives you your new (multiple) signature tags as i described, much like the parameterised replaceable protocol already existing 
 Ok - this might be left curve but... Why would we not just encode using CBOR, then attach a signature. 🤷‍♂️ Seems like that's all we'd need. 

Relays could destructure however they want for indexing but the data structure and event creation/encode/decode would be super simple. 
 all this has me thinking, this is something that nip-32 labels could be used for - providing the signatures for alternative encoding formats, retroactively, without altering the original event 
 Interesting. I think if we’re going to ever take the hit on changing anything to binary we should change it all to a binary format. 
 t-y, 4 thread 
 a metadata header event that is sent first which helps with preparing expectations, it would contain the IDs of the chunks 
 the distributed systems I’ve worked on outside of nostr typically offload the large data into an external store and plug a reference id into the event. the downstream consumers either find the data embedded in the event or an id to resolve it externally.  
 do these event kinds include a label and label namespace by default? probably would be a useful way to add category data and enable the kind to become a general purpose grouping operator (they could be stacked hierarchically too) 
 well, just a thought... aggregating things into groups is a general purpose that cuts across many use cases 
 eBooks are just our first use case because we want to start off with the Bible, but obviously, I can think of a gazillion different types of collections. A photo collage, or a playlist, or an audiobook, a "best of" grouping of someone's long-form articles, related wiki pages, or etc. 
 Been thinking the same, but I don't get yet when you would use this over tags. 

Bookmarks - > Tags
Book -> 30040
Music playlist -> ????? 
 yeah, arbitrary binary blob segments is an interesting one too

i'm not a big fan of the idea of serving data as events per se but they are ok as segmented blobs i think, they can even be additionally blockcahined so the entire set can't be tampered with 
 Been thinking the same, but I don't get yet when you would use this over tags. 

Bookmarks - > Tags
Book -> 30040
Music playlist -> ????? 
 only if you don't charge for access to it

this kind of spam attack is only viable for attackers who don't have to pay for the bytes that are being stored

crowdsourced moderation would help with this also... reporting is there for this reason 
 i haven't even been thinking about paid access exactly but accounting for event data would be important with this

i just made a thing that allows users to access via their presence on an npub's follow list, and with this oom-kill bug i've got at the moment just verifying that it was because of the terminal config, but i was planning to add so that the followed npubs follows also get read/write access... then the list becomes actually a list of moderators

but this semi-open architecture would entail issues with designated moderators having bad judgement and enabling scumbags, so all npubs event publication needs to have accounting, so, a new table in the DB, with a list of npubs and the size of the event data they have submitted within some time windows to track it over time so later i can impose limits for whatever reason, like the follows follows being rate limited as a collective to contain their data input since it's essentially free access... but it also means, with all of this, that it can be investigated forensically - and even, automatically if there is one of these "free tier" users flooding the relay with data then they can automatically be banned for a period if the ratio of theirs versus average is orders of magnitude

hm anyway, yeah, so i'm going to add npub-based event submit counting, and have the event saver log the current volume total

the actual time-series of this data can also be derived at some point too, since the relay already has an access time, i should make a created timestamp as well so it stores when the relay first saw the event, and this can then be used to assemble a time series graph 
 Very helpful, danke!  
 Yup. 

Would love to give my tag-based list a description tho 🙊.  
 Oops, yeah, I mean labels. (1975/1986 don't know what kind exactly) 
 Is label-list a thing?