I was thinking about it that while modern deep learning text to speech can produce natural sounding voice but they require a huge amount of computational power including a powerful GPU.. compared to them espeak even though it produces robotic sounds can convert text to speech with far fewer resources and support like 100 different languages.
Bigger does not always means better and ultimately everything depends upon the requirement of the application if for example a natural sounding voice is need then of course neural TTS are the choice otherwise simpler rule based TTS like espeak are the better choice.