Oddbean new post about | logout
 I got 7b to 11b working on desktop, taking about 5-7 gb memory to sample. The 3b param model was taking around 2gb of memory. 
 The new M3 Max is pretty crazy. I can get quick inference on a 13b Llama 2 chat model.