Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations

Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper a novel speech feature generationbased acoustic model training method is proposed. For decades, speaker adaptation methods have been widely used. All existing adaptation methods need adaptation data. However, our proposed method creates speaker-independent acoustic models that cover not only known but also unknown speakers. We do this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers are estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudospeakers' features. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models which are created are robust for unknown speakers.

Original languageEnglish
Title of host publicationAPSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011
Pages726-730
Number of pages5
Publication statusPublished - 2011
Externally publishedYes
EventAsia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011 - Xi'an, China
Duration: Oct 18 2011Oct 21 2011

Other

OtherAsia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011
CountryChina
CityXi'an
Period10/18/1110/21/11

Fingerprint

Linear regression
Maximum likelihood
Acoustics
Sampling

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing

Cite this

Itoh, A., Hara, S., Kitaoka, N., & Takeda, K. (2011). Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations. In APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011 (pp. 726-730)

Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations. / Itoh, Arata; Hara, Sunao; Kitaoka, Norihide; Takeda, Kazuya.

APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011. 2011. p. 726-730.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Itoh, A, Hara, S, Kitaoka, N & Takeda, K 2011, Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations. in APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011. pp. 726-730, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011, APSIPA ASC 2011, Xi'an, China, 10/18/11.
Itoh A, Hara S, Kitaoka N, Takeda K. Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations. In APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011. 2011. p. 726-730
Itoh, Arata ; Hara, Sunao ; Kitaoka, Norihide ; Takeda, Kazuya. / Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations. APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011. 2011. pp. 726-730
@inproceedings{d01e40b309474236b40d91996b17dabc,
title = "Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations",
abstract = "In this paper a novel speech feature generationbased acoustic model training method is proposed. For decades, speaker adaptation methods have been widely used. All existing adaptation methods need adaptation data. However, our proposed method creates speaker-independent acoustic models that cover not only known but also unknown speakers. We do this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers are estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudospeakers' features. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models which are created are robust for unknown speakers.",
author = "Arata Itoh and Sunao Hara and Norihide Kitaoka and Kazuya Takeda",
year = "2011",
language = "English",
pages = "726--730",
booktitle = "APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011",

}

TY - GEN

T1 - Training robust acoustic models using features of pseudo-speakers generated by inverse CMLLR transformations

AU - Itoh, Arata

AU - Hara, Sunao

AU - Kitaoka, Norihide

AU - Takeda, Kazuya

PY - 2011

Y1 - 2011

N2 - In this paper a novel speech feature generationbased acoustic model training method is proposed. For decades, speaker adaptation methods have been widely used. All existing adaptation methods need adaptation data. However, our proposed method creates speaker-independent acoustic models that cover not only known but also unknown speakers. We do this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers are estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudospeakers' features. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models which are created are robust for unknown speakers.

AB - In this paper a novel speech feature generationbased acoustic model training method is proposed. For decades, speaker adaptation methods have been widely used. All existing adaptation methods need adaptation data. However, our proposed method creates speaker-independent acoustic models that cover not only known but also unknown speakers. We do this by adopting inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers are estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudospeakers' features. Finally, using these features, we train the acoustic models. Evaluation results show that the acoustic models which are created are robust for unknown speakers.

UR - http://www.scopus.com/inward/record.url?scp=84866859478&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866859478&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84866859478

SP - 726

EP - 730

BT - APSIPA ASC 2011 - Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2011

ER -