Oddbean

▲ ▼

 I got 7b to 11b working on desktop, taking about 5-7 gb memory to sample. The 3b param model was taking around 2gb of memory.

▲ ▼

 The new M3 Max is pretty crazy. I can get quick inference on a 13b Llama 2 chat model.