With the rise of really good transcription models, TTS that’s actually enjoyable to listen to, and LLMs that can carry a conversation and understand complex commands, why haven’t we seen an explosion of really good voice interfaces?
It seems obvious to me but I’ve only seen Apple making a serious attempt with the latest Siri update. There are so many times that I’m doing something with my hands, driving, etc. and wish I could give commands to my RSS reader or just chat with an LLM that has the Arxiv and Wikipedia connected with RAG. nostr.fmt.wiz.biz