Llama 3.2 3b fine-tuned model running locally on device offline, at around 10 tokens/sec. 👀 https://image.nostr.build/4be8e36701a962fcaf67902a1cd2b994e71b5a0b4cce5721faa8aff34e103b1b.jpg nostr:nevent1qqsvjywpt5uqls5k7jtfdkq9ss56dq3t070cc65653f46pnmywlfzqgpzamhxue69uhhyetvv9ujuvrcvd5xzapwvdhk6tczyrr0wpmlz6va2r8e92t990ltl7kqtlrgg2u7uwgs38v4nw9dt4y06qcyqqqqqqgakxdty
Just tried this today as well! Which model do you think is best for most general use cases? Llama is my first choice as of now
This is pretty amazing to be honest. Almost 1k tokens per minute on a decent model. I assume it's a low watt ARM machine, can you calculate the sats per minute that it costs?
All you ever wanted?
How are you doing this?