Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation

Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers' data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers' features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models.

Original languageEnglish
Title of host publication2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings
Pages169-172
Number of pages4
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011 - Waikoloa, HI, United States
Duration: Dec 11 2011Dec 15 2011

Other

Other2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011
CountryUnited States
CityWaikoloa, HI
Period12/11/1112/15/11

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction

Cite this

Itoh, A., Hara, S., Kitaoka, N., & Takeda, K. (2011). Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation. In 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings (pp. 169-172). [6163925] https://doi.org/10.1109/ASRU.2011.6163925