Oddbean new post about | logout
 yeah, i think the main issue with strings and escaping is that you can't really predict anything about what's going to be inside the strings, and the primary intent of making the most common possible contrtol characters use the standard C style single is for space efficiency reasons

but \u and \/ need to be kept as i, and octal escapes, basically any number after a backslash, all need to be kept as is

the problem is that these escape codes have literally 3 representations, some have the single letter, and then there is \uXX and \uXXXX and then there is \[0-9] [0-9][0-9] octal codes, so if you decode them, you must re-encode them to the same form for deriving the ID hash

the encoding MUST define the normative encoding and any other form should be correctly accepted and retained as a literal... this pushes the issue off the relay devs and onto the clients where the problem originates anyway

in actual fact, you could even leave all this stuff out in the runtime data format, and just swallow the strings whole and leave them unprocessed  but in general the single letter escapes save memory space...

similarly i have opted to take advantage of the intent of binary encoding for pubkeys, id's and signatures, as well as e and p tags as well as the filters in runtime form as binary because of the space saving (and lack of further processing to store in the database)

it's this matter that leads to the need for a specification at all, if it had just been "store strings as they are exactly, b/c of crazy encoding schemes" the tests would not even be necessary, but on the other hand, the standard encodings put a lot of burden on devs writing client code that needs to make sense of these especially for non-latinic scripts,

so if one client uses 16 bit hex escapes, and another has to be able to decode them... so, someone has to decode control characters and the JSON standard is a mess and UTF-8 is complicated af

so, yeah, anyway, my 2 cents

good work on the relay tester!