Wavenet

posted September 12, 2016 #

I'll be honest, reading through this introductory paper on WaveNet, a Generative Model for Raw Audio, definitely has me a bit lost. Okay, I don't really fully grasp the tech behind it at all but that's not really the point. The point is that steps are being taken to greatly improve upon Text to Speech synthesis, adding a very human element to the computerized voices.

You can play around with the current state of TTS with terminal commands or on the web. It's jilted, stiff and amusing for it's robotic transparency. Compare that to the examples in this WaveNet doc and you'll immediately understand how impressive their work is.

Make sure you scroll down to the "Knowing What to Say" section to hear the incoherent babbling that informs the synthesis. Fascinating to say the least.