I've just built an AI assistant that performs voice recognition and text-to-speech directly on the device. It's using a fine-tuned Google Gemini Flash model, which is fast and works great.
I know, Google, right? But what if we replace that model with an open-source one, like Phi-3 or Gemma-2b, that can also run locally on a device, even a phone? It might be a bit slower and more battery-intensive, but in return, you get a completely private AI assistant that can run offline.
The fun part is I can make it into a PWA, so it can run on any device—Android, iOS, and PC. Plus, it will have proper Nostr integration.
You can check how these models might work on your device using WebGPU in your browser. We'll use this tech and even better ones.
LLM demo in browser: https://webllm.mlc.ai/
I'll also add PDF and vision capabilities. If I'm not making a big miscalculation, with one toggle, you should be able to use Stable Diffusion in one PWA—offline, locally, and completely private.
Let me know if you have any suggestions or recommendations for models or features. I'll share an initial version soon, and from there, we can improve it together.
https://pub-c7848a5caa274580ba42f37d3b70e823.r2.dev/VN20240523_010102.mp4
Awesome I'll test this out
Been using my own local LLM's & (been but slowly) building a voice & tts for myself. I'll use yours for inspiration
Wondering if any great new models have come from the Llama 3 data. Haven't been keeping up with local AI stuff, but I'm expecting great things.