Acoustic model training using pseudo-speaker features generated by MLLR transformations for robust speaker-independent speech recognition

Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.

Original languageEnglish
Pages (from-to)2479-2485
Number of pages7
JournalIEICE Transactions on Information and Systems
VolumeE95-D
Issue number10
DOIs
Publication statusPublished - Oct 2012
Externally publishedYes

Keywords

  • Acoustic model training
  • Feature generation
  • MLLR
  • Pseudo speakers
  • Speech recognition

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Acoustic model training using pseudo-speaker features generated by MLLR transformations for robust speaker-independent speech recognition'. Together they form a unique fingerprint.

  • Cite this