Oddbean new post about | logout
 Llama 3.2 3b fine-tuned model running locally on device offline, at around 10 tokens/sec. 👀 https://image.nostr.build/4be8e36701a962fcaf67902a1cd2b994e71b5a0b4cce5721faa8aff34e103b1b.jpg
nostr:nevent1qqsvjywpt5uqls5k7jtfdkq9ss56dq3t070cc65653f46pnmywlfzqgpzamhxue69uhhyetvv9ujuvrcvd5xzapwvdhk6tczyrr0wpmlz6va2r8e92t990ltl7kqtlrgg2u7uwgs38v4nw9dt4y06qcyqqqqqqgakxdty 
 Just tried this today as well! Which model do you think is best for most general use cases? Llama is my first choice as of now 
 This is pretty amazing to be honest.  Almost 1k tokens per minute on a decent model.  I assume it's a low watt ARM machine, can you calculate the sats per minute that it costs? 
 All you ever wanted? 
 How are you doing this?