Oddbean new post about | logout
 We need distributed training so we can create foundational models in a permissionless way with thousands of consumer GPUs. Maybe with Bitcoin bids for processing requests so it will kinda feel like mining in the old days. 
 Like… I post the dataset, along with a bid in a coordinator. The coordinator breaks the tasks into small chunks and distributes the rewards to everyone running the training. 
 the training needs big amount of GPU on the same machine but i am sure some will figure out to scale out. mixture of experts is the popular thing nowadays. maybe each of those experts can be trained on a single machine and later combined.