Oddbean

▲ ▼

 Llama 3.2 3b fine-tuned model running locally on device offline, at around 10 tokens/sec. 👀 https://image.nostr.build/4be8e36701a962fcaf67902a1cd2b994e71b5a0b4cce5721faa8aff34e103b1b.jpg
nostr:nevent1qqsvjywpt5uqls5k7jtfdkq9ss56dq3t070cc65653f46pnmywlfzqgpzamhxue69uhhyetvv9ujuvrcvd5xzapwvdhk6tczyrr0wpmlz6va2r8e92t990ltl7kqtlrgg2u7uwgs38v4nw9dt4y06qcyqqqqqqgakxdty

▲ ▼

 Just tried this today as well! Which model do you think is best for most general use cases? Llama is my first choice as of now

▲ ▼

 This is pretty amazing to be honest.  Almost 1k tokens per minute on a decent model.  I assume it's a low watt ARM machine, can you calculate the sats per minute that it costs?

▲ ▼

 All you ever wanted?

▲ ▼

 How are you doing this?