Abstract

First, a short overview will be provided about the basics behind text-to-speech (TTS) synthesis engines showing the advantages and disandantages of the approaches currently in use. Systems like the concatenative, the statistical and the neural speech synthesis systems will be presented. I will then make a focus more on the neural approaches presenting their basic components, the current issues with the state-of-the art neural based systems, what is considered to be as solved as well as the challenges ahead.

Short Bio

Yannis Stylianou is Professor of Speech Processing at University of Crete, in Greece and Research Manager at Apple, Cambridge UK. From 1996 until 2001 he was with AT&T Labs Research (Murray Hill and Florham Park, NJ, USA) and until 2002 he was with Bell-Labs Lucent Technologies, in Murray Hill, NJ, USA. He is with University of Crete since 2002. From 2013 until 2018 (July) he was Group Leader of the Speech Technology Group at Toshiba Cambridge Research Lab in Cambridge UK. He joined Apple in Aug 2018. He holds MSc and PhD from ENST-Paris on Signal Processing and he has studied Electrical Engineering at NTUA Athens Greece (1991). He is an IEEE Fellow and an ISCA Fellow.

Getting here