I was thinking about it that while modern deep learning text to speech can produce natural sounding voice but they require a huge amount of computational power including a powerful GPU.. compared to them espeak even though it produces robotic sounds can convert text to speech with far fewer resources and support like 100 different languages.