How many params?
I got 7b to 11b working on desktop, taking about 5-7 gb memory to sample. The 3b param model was taking around 2gb of memory.
The new M3 Max is pretty crazy. I can get quick inference on a 13b Llama 2 chat model.