@957492b3 Not only can you run “local LLMs”, you can use your store of documents to “tune” the models. Been evaluating OSS LlamaCpp written in C++, GPT4All packaged as part of PrivateGPT which runs on M1 Macbook CPUs/GPUs - https://github.com/imartinez/privateGPT. Working on performance as the responses can be anywhere from 1s to 7s.
@5d116069 This is fascinating, thank you for sharing. Even if it's 10 seconds today, that clearly will fall over time. If we wanted an Alexa-like product, I like the idea of a local base model to handle all speech-to-text requests. Out of the box, it would likely be quite bad but if we could train it with a long series of home focused queries (in the form of docs?), that would be interesting. I wonder how hard it is to make it (or likely improve it) to handle a new topic?