Statistical parametric speech synthesis, based on Hidden Markov Models has been demonstrated to be very effective in synthesizing high quality, natural and expressive speech. This technique is also able to provide high flexibility as a speech production model and a small database footprint.
Based on the existing HTS engine, we developed a streaming architecture of the system, called performative-HTS or pHTS. On top of pHTS we developed MAGE, a thread safe and engine independent layer of pHTS, that can be used in reactive speech synthesis designs, (i.e. a design that can be often interrupted and can respond in real-time to requests).
Quantitative evaluations of the system show that the degradation of speech quality in pHTS is small with reference to HTS, even though pHTS has a delay of one phonetic label only . These results are supported by a subjective and an objective evaluation, which confirms that HTS and pHTS resulting speech waveforms can hardly be distinguished.
For more details on the architecture of the systems :
If you have questions or remarks, please contact Maria Astrinaki.
MAGE and pHTS has been developed by several members of University of Mons – NUMEDIART Institute and Acapela Group :