Acoustic model training using pseudo-speaker features generated by MLLR transformations for robust speaker-independent speech recognition

Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.

Original languageEnglish
Pages (from-to)2479-2485
Number of pages7
JournalIEICE Transactions on Information and Systems
VolumeE95-D
Issue number10
DOIs
Publication statusPublished - Oct 2012
Externally publishedYes

Fingerprint

Speech recognition
Linear regression
Maximum likelihood
Acoustics
Sampling

Keywords

  • Acoustic model training
  • Feature generation
  • MLLR
  • Pseudo speakers
  • Speech recognition

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Software
  • Artificial Intelligence
  • Hardware and Architecture
  • Computer Vision and Pattern Recognition

Cite this

Acoustic model training using pseudo-speaker features generated by MLLR transformations for robust speaker-independent speech recognition. / Itoh, Arata; Hara, Sunao; Kitaoka, Norihide; Takeda, Kazuya.

In: IEICE Transactions on Information and Systems, Vol. E95-D, No. 10, 10.2012, p. 2479-2485.

Research output: Contribution to journalArticle

@article{ae9028373b7c4b5cb140478e04fd7524,
title = "Acoustic model training using pseudo-speaker features generated by MLLR transformations for robust speaker-independent speech recognition",
abstract = "A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.",
keywords = "Acoustic model training, Feature generation, MLLR, Pseudo speakers, Speech recognition",
author = "Arata Itoh and Sunao Hara and Norihide Kitaoka and Kazuya Takeda",
year = "2012",
month = "10",
doi = "10.1587/transinf.E95.D.2479",
language = "English",
volume = "E95-D",
pages = "2479--2485",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "10",

}

TY - JOUR

T1 - Acoustic model training using pseudo-speaker features generated by MLLR transformations for robust speaker-independent speech recognition

AU - Itoh, Arata

AU - Hara, Sunao

AU - Kitaoka, Norihide

AU - Takeda, Kazuya

PY - 2012/10

Y1 - 2012/10

N2 - A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.

AB - A novel speech feature generation-based acoustic model training method for robust speaker-independent speech recognition is proposed. For decades, speaker adaptation methods have been widely used. All of these adaptation methods need adaptation data. However, our proposed method aims to create speaker-independent acoustic models that cover not only known but also unknown speakers. We achieve this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then we train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the transformation matrices for the existing speakers are estimated. Next, we construct pseudo-speaker transformations by sampling the weight parameters from the distribution, and apply the transformation to the normalized features of the existing speaker to generate the features of the pseudo-speakers. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models trained using our proposed method are robust for unknown speakers.

KW - Acoustic model training

KW - Feature generation

KW - MLLR

KW - Pseudo speakers

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84867222085&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867222085&partnerID=8YFLogxK

U2 - 10.1587/transinf.E95.D.2479

DO - 10.1587/transinf.E95.D.2479

M3 - Article

AN - SCOPUS:84867222085

VL - E95-D

SP - 2479

EP - 2485

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 10

ER -