Meta released MobileLLM models in 125M, 350M, 600M, and 1B parameter sizes. their 1B Llama model performs well on phones, even mid-range devices. These one are specifically designed and optimized for mobile hardware. Works with llama.c https://huggingface.co/collections/facebook/mobilellm-6722be18cb86c20ebe113e95
How do you run them?
At the moment, it functions with llama.c via any terminal. Once it's quantized, you should be able to use it with mlc as well. I will also integrate it into Aithena. This will allow you to run it locally in your browser on any device without sacrificing performance.
Right now, you can try other model, run them locally on any device. When I say 'other,' I mean pretty much any open-source model. In the settings, you have the option to choose a model. Once selected, it will be downloaded and cached for future use. https://aitheena.vercel.app/