Oddbean

▲ ▼

 apparently my bitcoind leveldb corrupted *yet again* so I have to redownload the entire blockchain from scratch. ffs. leveldb is terrible.

▲ ▼

 Same issue for me 100% of the time I try to put it on a network drive, so fucking annoying

▲ ▼

 I have it on a brand new 1tb ssd and it still fucked up.

▲ ▼

 And nothing involving network drives?

▲ ▼

 god no

▲ ▼

 maybe I need to start zfs snapshotting so I can at least rollback to fix leveldb corruption

▲ ▼

 or just put it in a ext4 partition like our anchestor used to do

▲ ▼

 zfs is an abomination from Sun Microsystems Solaris OS which is based on BSD

BSD also has a shitty, similar thing that ZFS is an quote "improvement" on

you can't use mount, umount, or fsck... it's all some bullshit zsomething this zsomeotherthing that

▲ ▼

 have fun when the filesystem itself needs to be accessed outside of the boot environment because 

for some reason, these BSD (and thus solaris) unixes aren't up to speed on making disk mounting and fixing easy, you know, mount... umount... fsck...

these tools won't help you with zfs and ufs

its probably the biggest reason why nobody uses BSD or solaris anymore

it's not intuitive at all

▲ ▼

 no, what you need to do is stop the daemon, duplicate the state of it, and restart it, and if it fucks up you just zap the old version, copy back the new version, and wait an hour or two and it's back

▲ ▼

 i use badgerdb to store the actual events as values in keys in my database, but they are rarely over 500k... bitcoin blocks are 2-4mb typically, honestly, they should not be stored in the DB, the differential is the age old stacking problem of slivers and chunks that used to be a big problem with networks until about 10 years ago

if i was going to write a database driver for btcd i'd use badger for the indexes, and store the blocks in a flat filesystem named by the block hash... they are too big

or, it might work with badger, because badger actually stacks the values in one file and the keys in another...

anyway, my point here is that the entire filesystem of bitcoin's leveldb mutates so much you can't really snapshot it properly, they randomly change half the dataset for whatever reason and when you use `cp` with `-rfvpu` which retains the perms and only copies files that have been changed... it still copies all of the files because all of them have been changed so, yeah

it's dumb because bitcoin values are mostly just the blocks, and the indexes are mostly just keys, so having them separated would actually make backup time-effective instead of a collossal pain in the ass

▲ ▼

 i use badgerdb to store the actual events as values in keys in my database, but they are rarely over 500k... bitcoin blocks are 2-4mb typically, honestly, they should not be stored in the DB, the differential is the age old stacking problem of slivers and chunks that used to be a big problem with networks until about 10 years ago

if i was going to write a database driver for btcd i'd use badger for the indexes, and store the blocks in a flat filesystem named by the block hash... they are too big

or, it might work with badger, because badger actually stacks the values in one file and the keys in another...

anyway, my point here is that the entire filesystem of bitcoin's leveldb mutates so much you can't really snapshot it properly, they randomly change half the dataset for whatever reason and when you use `cp` with `-rfvpu` which retains the perms and only copies files that have been changed... it still copies all of the files because all of them have been changed so, yeah

it's dumb because bitcoin values are mostly just the blocks, and the indexes are mostly just keys, so having them separated would actually make backup time-effective instead of a collossal pain in the ass

▲ ▼

 lol, the events are stored in values, the indexes in keys

and yes, this would instantly solve the problem of bitcoin

if i had any time spare i'd make a PR to create a btcd database driver with badger because it would probably fix most of its slow IBD

▲ ▼

 most SSDs now error instead of returning garbage most of the time also if they detect a checksum error

▲ ▼

 fwiw, i kind of doubt it's a leveldb issue, i've done thousands of syncs from scratch without corruption, usually these kind of things are due to specific hardware conflicts or bugs

▲ ▼

 really man? I have had this happen on many different drives, 20+ times the past couple years. this never used to happen.

▲ ▼

 doesn't have to be the drive, could be a CPU issue too, or memory, or OS, anything can introduce corruption
bitcoind has always been a graet burn-in test for hardware

▲ ▼

 must be the sophons

▲ ▼

 > bitcoind has always been a graet burn-in test for hardware
No kidding. It's been difficult provisioning resources in my cluster to bitcoin, like it gets expensive if you run an enterprise stack. It feels like it was made to run on mostly shit hardware, more bang for the buck than consuming my expensive redundant/HA resources. 

I've also had some serious history with bitcoind corruption too. I had a support contract for a while and there was no shortage of DB corruption, mostly on junker hardware though.

▲ ▼

 usually I can just chainstate reindex but this is the first time I've had to do a full redownload... sigh.

▲ ▼

 the fact that you got a failure at the same point the second time kind of reduces the chance it was a leveldb bug; likely, likely, the block file was corrupted

▲ ▼

 Raspberry pi?

▲ ▼

 no im not retarded

▲ ▼

 Sorry having flashbacks

▲ ▼

 I had a levelDB issue myself a couple of weeks ago. FYI there's a couple of CLI options to bitcoind called -reindex and -reindex-chainstate that are probably faster than starting over.

https://bitcoin.stackexchange.com/questions/65680/how-to-recover-corrupted-bitcoin-core-blockchain

▲ ▼

 I know... I hit 

LevelDB read failure: Corruption: block checksum mismatch: /titan/bitcoin/chainstate/612920.ldb

during reindex, so I have to start over

▲ ▼

 you get that error because it's a robust database; if leveldb didn't do the CRC checksum on records, the UTXO corruption would go undetected, and there could be much worse consequences like losing coins

▲ ▼

 in any case, if you get another corrupted ldb, please send it to me i should have tooling to investigate the kind of corruption somewhere

▲ ▼

 I have talked to so many people and it happens way more often than core devs want to admit. its easy to blame hardware issues but I never seem to have any issues anywhere else.

▲ ▼

 the only thing I can think of is some ACID issue with leveldb, maybe with its configuration. not sure if it has been smoke tested.

▲ ▼

 it's definitely not impossible that there's a bug in leveldb ! maybe it's possible to diagnose it if it keeps happening more or less consistently there

it's just that, data corruption messages tend to show as "error in leveldb" (because that's the thing causing most of i/o) and thus people are first to blame leveldb, it's such a common trope

i've investigated a few corrupted databases back in the day and every time it was a bitflip or some other issue more likely to be caused by hardware than software problems
while on reliable hardware it tends to be possible to run syncs back to back in a loop without trouble

the thing is, there are few workloads that put modern hardware to the test, and are also super-critical about every bit of correctness like a blockchain sync, so it makes sense you're not seeing problems with anything else

still, it could potentially be a bug that happens in very specific circumstances

▲ ▼

 have you tried btcd ? I found it more reliable for my simple usecase of running lnd..

▲ ▼

 It had had many consensus related issues, im good

▲ ▼

 I’m in pain right now too. Pi5 8GB, umbrel. At block 725,000 or so 🐜🐜🐜

▲ ▼

 Pis are terrible nodes, good luck

▲ ▼

 Just pleb things

▲ ▼

 I've got11th gen intel nuc for 110 eur, which is less than PI cost. Added some cheap ddr4 and 2tb nvme. The drive I would have to get anyway, so it's just the price of ram as extra cost, but it packs A LOT more power than the pi has.

▲ ▼

 Mhhh I have a node on a pi 4 / 8gb with umbrel for 3 years and never had a single problem. Am i lucky ?

▲ ▼

 maybe, i tend to run the latest version of everything (zfs, kernel, bitcoin, etc) so its always possible I’m experiencing some new bug

▲ ▼

 let me know when you are ready… I wanted to give you 1000 sats for that cool nostruch animation 😎

▲ ▼

 Same! I have it on zfs RAID 1 mirrored drives, still see corruption. But the filesystem itself sees the corruption, which points to hardware which I will change. But it only happens on bitcoin blocks

▲ ▼

 point to hardware even though raid doesn’t  handle it? what do you mean

▲ ▼

 Yeah I don't understand how you can have corruption of a specific file on mirrored drives. Does that mean both drives have corruption in the exact same place, and how likely is that? But yet, `zpool status -v` shows a blockchain block file has a CRC error.

▲ ▼

 Same problem here using and external SSD formatted fat32.
What filesystem are you using?

▲ ▼

 zfs + internal ssd

▲ ▼

 So completely unrelated. Weird.

▲ ▼

 Or it's really expensive to run any sort of HA node. I'm working on getting multiple instances running with a load balancer to achieve some software HA at the present moment