Do you know of any great text to speech models that do intonation well? Open weights. They do not need to clone voices.
I've tried suno bark, but it sometimes hallucinates. I need the reading to be literally what's written. Also tried f5-tts, intonation is not great and the speed varies a lot, so when it's reading multiple texts, the speed of output speech is different between generation. The duration predictor is also not great and sometimes causes cutoffs.
Have I missed something?
English only for now is ok.