Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation

Arata Itoh, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers' data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers' features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models.

Original languageEnglish
Title of host publication2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings
Pages169-172
Number of pages4
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011 - Waikoloa, HI, United States
Duration: Dec 11 2011Dec 15 2011

Other

Other2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011
CountryUnited States
CityWaikoloa, HI
Period12/11/1112/15/11

Fingerprint

Seed
Linear regression
Maximum likelihood
Speech recognition
Acoustics
Sampling

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction

Cite this

Itoh, A., Hara, S., Kitaoka, N., & Takeda, K. (2011). Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation. In 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings (pp. 169-172). [6163925] https://doi.org/10.1109/ASRU.2011.6163925

Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation. / Itoh, Arata; Hara, Sunao; Kitaoka, Norihide; Takeda, Kazuya.

2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings. 2011. p. 169-172 6163925.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Itoh, A, Hara, S, Kitaoka, N & Takeda, K 2011, Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation. in 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings., 6163925, pp. 169-172, 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Waikoloa, HI, United States, 12/11/11. https://doi.org/10.1109/ASRU.2011.6163925
Itoh A, Hara S, Kitaoka N, Takeda K. Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation. In 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings. 2011. p. 169-172. 6163925 https://doi.org/10.1109/ASRU.2011.6163925
Itoh, Arata ; Hara, Sunao ; Kitaoka, Norihide ; Takeda, Kazuya. / Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation. 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings. 2011. pp. 169-172
@inproceedings{84934bdfce3b4f12a0aff55063b77e63,
title = "Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation",
abstract = "In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers' data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers' features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models.",
author = "Arata Itoh and Sunao Hara and Norihide Kitaoka and Kazuya Takeda",
year = "2011",
doi = "10.1109/ASRU.2011.6163925",
language = "English",
isbn = "9781467303675",
pages = "169--172",
booktitle = "2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings",

}

TY - GEN

T1 - Robust seed model training for speaker adaptation using pseudo-speaker features generated by inverse CMLLR transformation

AU - Itoh, Arata

AU - Hara, Sunao

AU - Kitaoka, Norihide

AU - Takeda, Kazuya

PY - 2011

Y1 - 2011

N2 - In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers' data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers' features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models.

AB - In this paper, we propose a novel acoustic model training method which is suitable for speaker adaptation in speech recognition. Our method is based on feature generation from a small amount of speakers' data. For decades, speaker adaptation methods have been widely used. Such adaptation methods need some amount of adaptation data and if the data is not sufficient, speech recognition performance degrade significantly. If the seed models to be adapted to a specific speaker can widely cover more speakers, speaker adaptation can perform robustly. To make such robust seed models, we adopt inverse maximum likelihood linear regression (MLLR) transformation-based feature generation, and then train our seed models using these features. First we obtain MLLR transformation matrices from a limited number of existing speakers. Then we extract the bases of the MLLR transformation matrices using PCA. The distribution of the weight parameters to express the MLLR transformation matrices for the existing speakers is estimated. Next we generate pseudo-speaker MLLR transformations by sampling the weight parameters from the distribution, and apply the inverse of the transformation to the normalized existing speaker features to generate the pseudo-speakers' features. Finally, using these features, we train the acoustic seed models. Using this seed models, we obtained better speaker adaptation results than using simply environmentally adapted models.

UR - http://www.scopus.com/inward/record.url?scp=84858993998&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858993998&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2011.6163925

DO - 10.1109/ASRU.2011.6163925

M3 - Conference contribution

SN - 9781467303675

SP - 169

EP - 172

BT - 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings

ER -