Rapid acoustic model adaptation using inverse MLLR-based feature generation

Arata Ito, Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a technique for generating a large amount of target speaker-like speech features by converting a large amount of prepared speech features of many speakers into features similar to those of the target speaker using a transformation matrix. To generate a large amount of target speaker-like features, the system only needs a very small amount of the target speaker's utterances. This technique enables the system to adapt the acoustic model efficiently from a small amount of the target speaker's utterances. To evaluate the proposed method, we prepared 100 reference speakers and 12 target (test) speakers. We conducted the experiments in an isolated word recognition task using a speech database collected by real PC-based distributed environments and compared our proposed method with MLLR, MAP and the method theoretically equivalent to the SAT. Experimental results proved that the proposed method needed a significantly smaller amount of the target speaker's utterances than conventional MLLR, MAP and SAT.

Original languageEnglish
Title of host publication20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society
Pages3783-3788
Number of pages6
Volume5
Publication statusPublished - 2010
Externally publishedYes
Event20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society - Sydney, NSW, Australia
Duration: Aug 23 2010Aug 27 2010

Other

Other20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society
CountryAustralia
CitySydney, NSW
Period8/23/108/27/10

Fingerprint

acoustics

ASJC Scopus subject areas

  • Acoustics and Ultrasonics

Cite this

Ito, A., Hara, S., Kitaoka, N., & Takeda, K. (2010). Rapid acoustic model adaptation using inverse MLLR-based feature generation. In 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society (Vol. 5, pp. 3783-3788)

Rapid acoustic model adaptation using inverse MLLR-based feature generation. / Ito, Arata; Hara, Sunao; Kitaoka, Norihide; Takeda, Kazuya.

20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society. Vol. 5 2010. p. 3783-3788.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ito, A, Hara, S, Kitaoka, N & Takeda, K 2010, Rapid acoustic model adaptation using inverse MLLR-based feature generation. in 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society. vol. 5, pp. 3783-3788, 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating the 2010 Annual Conference of the Australian Acoustical Society, Sydney, NSW, Australia, 8/23/10.
Ito A, Hara S, Kitaoka N, Takeda K. Rapid acoustic model adaptation using inverse MLLR-based feature generation. In 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society. Vol. 5. 2010. p. 3783-3788
Ito, Arata ; Hara, Sunao ; Kitaoka, Norihide ; Takeda, Kazuya. / Rapid acoustic model adaptation using inverse MLLR-based feature generation. 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society. Vol. 5 2010. pp. 3783-3788
@inproceedings{a4f62a69cf904a7abb8b4a0df399afd7,
title = "Rapid acoustic model adaptation using inverse MLLR-based feature generation",
abstract = "We propose a technique for generating a large amount of target speaker-like speech features by converting a large amount of prepared speech features of many speakers into features similar to those of the target speaker using a transformation matrix. To generate a large amount of target speaker-like features, the system only needs a very small amount of the target speaker's utterances. This technique enables the system to adapt the acoustic model efficiently from a small amount of the target speaker's utterances. To evaluate the proposed method, we prepared 100 reference speakers and 12 target (test) speakers. We conducted the experiments in an isolated word recognition task using a speech database collected by real PC-based distributed environments and compared our proposed method with MLLR, MAP and the method theoretically equivalent to the SAT. Experimental results proved that the proposed method needed a significantly smaller amount of the target speaker's utterances than conventional MLLR, MAP and SAT.",
author = "Arata Ito and Sunao Hara and Norihide Kitaoka and Kazuya Takeda",
year = "2010",
language = "English",
isbn = "9781617827457",
volume = "5",
pages = "3783--3788",
booktitle = "20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society",

}

TY - GEN

T1 - Rapid acoustic model adaptation using inverse MLLR-based feature generation

AU - Ito, Arata

AU - Hara, Sunao

AU - Kitaoka, Norihide

AU - Takeda, Kazuya

PY - 2010

Y1 - 2010

N2 - We propose a technique for generating a large amount of target speaker-like speech features by converting a large amount of prepared speech features of many speakers into features similar to those of the target speaker using a transformation matrix. To generate a large amount of target speaker-like features, the system only needs a very small amount of the target speaker's utterances. This technique enables the system to adapt the acoustic model efficiently from a small amount of the target speaker's utterances. To evaluate the proposed method, we prepared 100 reference speakers and 12 target (test) speakers. We conducted the experiments in an isolated word recognition task using a speech database collected by real PC-based distributed environments and compared our proposed method with MLLR, MAP and the method theoretically equivalent to the SAT. Experimental results proved that the proposed method needed a significantly smaller amount of the target speaker's utterances than conventional MLLR, MAP and SAT.

AB - We propose a technique for generating a large amount of target speaker-like speech features by converting a large amount of prepared speech features of many speakers into features similar to those of the target speaker using a transformation matrix. To generate a large amount of target speaker-like features, the system only needs a very small amount of the target speaker's utterances. This technique enables the system to adapt the acoustic model efficiently from a small amount of the target speaker's utterances. To evaluate the proposed method, we prepared 100 reference speakers and 12 target (test) speakers. We conducted the experiments in an isolated word recognition task using a speech database collected by real PC-based distributed environments and compared our proposed method with MLLR, MAP and the method theoretically equivalent to the SAT. Experimental results proved that the proposed method needed a significantly smaller amount of the target speaker's utterances than conventional MLLR, MAP and SAT.

UR - http://www.scopus.com/inward/record.url?scp=84869128367&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84869128367&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781617827457

VL - 5

SP - 3783

EP - 3788

BT - 20th International Congress on Acoustics 2010, ICA 2010 - Incorporating Proceedings of the 2010 Annual Conference of the Australian Acoustical Society

ER -