Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations

Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Research output: Contribution to conferencePaperpeer-review

1 Citation (Scopus)

Abstract

In this paper a novel speech feature generationbased acoustic model training method is proposed. For decades, speaker adaptation methods have been widely used. All existing adaptation methods need adaptation data. However, our proposed method creates speaker-independent acoustic models that cover not only known but also unknown speakers. We do this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers are estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudospeakers' features. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models which are created are robust for unknown speakers.

Original languageEnglish
Pages726-730
Number of pages5
Publication statusPublished - Dec 1 2011
Externally publishedYes
EventAsia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011 - Xi'an, China
Duration: Oct 18 2011Oct 21 2011

Other

OtherAsia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011
Country/TerritoryChina
CityXi'an
Period10/18/1110/21/11

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations'. Together they form a unique fingerprint.

Cite this