Method, device and computer program code means for voice conversion

< Itzuli

Erreferentzia: 2215632

Emate-data: 16.03.2011

A method of converting a source speaker’s speech signal into a converted voice signal, which comprises the steps of: a stage of training, in which: given a training database of parallel source and target data, for each pitch period of said training database: modelling each pitch period by means of a glottal waveform and a vocal tract filter according to Lu and Smith’s model, to obtain a set of LF parameters, said set of LF parameters comprising an excitation strength parameter and a set of T-parameters modelling a glottal waveform, and a set of all-pole vocal tract filter coefficients; converting said T-parameters into R-parameters; converting said all-pole vocal tract filter coefficients into line spectral frequencies in Bark scale; defining a glottal vector to be converted; defining a vocal tract vector to be converted, said vocal tract vector comprising said line spectral frequencies in Bark scale; applying wavelet denoising to obtain an estimate of a glottal aspiration noise; from the set of vocal tract vectors obtained for each pitch period of the said training database, estimating a vocal tract continuous probabilistic linear transformation function using the least square error criterion. The stage of modelling further comprises the steps of: modelling said aspiration noise estimate by modulating Gaussian noise with the said modelled glottal waveform and adjusting its energy to match that of the said aspiration noise estimate; said glottal vector to be converted comprising said excitation strength parameter, said R-parameters and said energy of the aspiration noise estimate. The method further comprises a stage of conversion and a stage of synthesis.