Two-stage F0 control model using syllable based F0 units

Masanobu Abe, Hirokazu Sato

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Citations (Scopus)

Abstract

This paper proposes syllable-based F0 units(SBUs) for F0 contour generation and a two-stage strategy. The two-stage strategy provides a flexible F0 generation framework by introducing a global model and local model. The local model consists of the SBUs which make it possible to precisely estimate F0 contour using segmental information. Experimental results show that the proposed approach can generate a good global model(the measured multiple correlation coefficient is 0.843), and can precisely estimate average F0(the measured multiple correlation coefficient is 0.875). It is also confirmed that generating SBUs according to syllable positions is important in precisely estimating F0 contour. Listening tests show that speech synthesized with the proposed model is preferred to the output of the conventional model. We expect that the approach will prove to be useful and powerful for synthesizing various types of speech.

Original languageEnglish
Title of host publicationICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages53-56
Number of pages4
Volume2
ISBN (Electronic)0780305329
DOIs
Publication statusPublished - 1992
Externally publishedYes
Event1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992 - San Francisco, United States
Duration: Mar 23 1992Mar 26 1992

Other

Other1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992
CountryUnited States
CitySan Francisco
Period3/23/923/26/92

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Abe, M., & Sato, H. (1992). Two-stage F0 control model using syllable based F0 units. In ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing (Vol. 2, pp. 53-56). [226122] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.1992.226122

Two-stage F0 control model using syllable based F0 units. / Abe, Masanobu; Sato, Hirokazu.

ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing. Vol. 2 Institute of Electrical and Electronics Engineers Inc., 1992. p. 53-56 226122.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abe, M & Sato, H 1992, Two-stage F0 control model using syllable based F0 units. in ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing. vol. 2, 226122, Institute of Electrical and Electronics Engineers Inc., pp. 53-56, 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992, San Francisco, United States, 3/23/92. https://doi.org/10.1109/ICASSP.1992.226122
Abe M, Sato H. Two-stage F0 control model using syllable based F0 units. In ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing. Vol. 2. Institute of Electrical and Electronics Engineers Inc. 1992. p. 53-56. 226122 https://doi.org/10.1109/ICASSP.1992.226122
Abe, Masanobu ; Sato, Hirokazu. / Two-stage F0 control model using syllable based F0 units. ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing. Vol. 2 Institute of Electrical and Electronics Engineers Inc., 1992. pp. 53-56
@inproceedings{4ce9f592e4af43c5b78d64e3c4b7cbd8,
title = "Two-stage F0 control model using syllable based F0 units",
abstract = "This paper proposes syllable-based F0 units(SBUs) for F0 contour generation and a two-stage strategy. The two-stage strategy provides a flexible F0 generation framework by introducing a global model and local model. The local model consists of the SBUs which make it possible to precisely estimate F0 contour using segmental information. Experimental results show that the proposed approach can generate a good global model(the measured multiple correlation coefficient is 0.843), and can precisely estimate average F0(the measured multiple correlation coefficient is 0.875). It is also confirmed that generating SBUs according to syllable positions is important in precisely estimating F0 contour. Listening tests show that speech synthesized with the proposed model is preferred to the output of the conventional model. We expect that the approach will prove to be useful and powerful for synthesizing various types of speech.",
author = "Masanobu Abe and Hirokazu Sato",
year = "1992",
doi = "10.1109/ICASSP.1992.226122",
language = "English",
volume = "2",
pages = "53--56",
booktitle = "ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Two-stage F0 control model using syllable based F0 units

AU - Abe, Masanobu

AU - Sato, Hirokazu

PY - 1992

Y1 - 1992

N2 - This paper proposes syllable-based F0 units(SBUs) for F0 contour generation and a two-stage strategy. The two-stage strategy provides a flexible F0 generation framework by introducing a global model and local model. The local model consists of the SBUs which make it possible to precisely estimate F0 contour using segmental information. Experimental results show that the proposed approach can generate a good global model(the measured multiple correlation coefficient is 0.843), and can precisely estimate average F0(the measured multiple correlation coefficient is 0.875). It is also confirmed that generating SBUs according to syllable positions is important in precisely estimating F0 contour. Listening tests show that speech synthesized with the proposed model is preferred to the output of the conventional model. We expect that the approach will prove to be useful and powerful for synthesizing various types of speech.

AB - This paper proposes syllable-based F0 units(SBUs) for F0 contour generation and a two-stage strategy. The two-stage strategy provides a flexible F0 generation framework by introducing a global model and local model. The local model consists of the SBUs which make it possible to precisely estimate F0 contour using segmental information. Experimental results show that the proposed approach can generate a good global model(the measured multiple correlation coefficient is 0.843), and can precisely estimate average F0(the measured multiple correlation coefficient is 0.875). It is also confirmed that generating SBUs according to syllable positions is important in precisely estimating F0 contour. Listening tests show that speech synthesized with the proposed model is preferred to the output of the conventional model. We expect that the approach will prove to be useful and powerful for synthesizing various types of speech.

UR - http://www.scopus.com/inward/record.url?scp=85009260788&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009260788&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.1992.226122

DO - 10.1109/ICASSP.1992.226122

M3 - Conference contribution

VL - 2

SP - 53

EP - 56

BT - ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing

PB - Institute of Electrical and Electronics Engineers Inc.

ER -