A Japanese TTS system based on multiform units and a speech modification algorithm with harmonics reconstruction

Satoshi Takano, Kimihito Tanaka, Hideyuki Mizuno, Masanobu Abe

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

This paper proposes a new lext-to-speech (TTS) system that utilizes large numbers of speech segments to produce very natural and intelligible synthetic speech. There are two innovations; new multiform synthesis units and a new speech modification algorithm based on a vocoder that offers harmonics reconstruction. The multiform units make it possible to reduce acoustic discontinuities at concatenation points and unnatural sound by preparing synthesis units with various lengths and various Fo contours. The new speech modification algorithm, on the other hand, improves the quality of prosody modified speech. This algorithm is extremely effective in synthesizing speech whose prosodic parameters are quite different from those of synthesis units. Listening tests confirm that the new synthesis units yield speech with high intelligibility and naturalness, and that the new speech modification algorithm is superior to all other conventional vocoders and waveform domain algorithms including TD-PSOLA, especially when modifying the Fo frequency upward.

Original languageEnglish
Pages (from-to)3-10
Number of pages8
JournalIEEE Transactions on Speech and Audio Processing
Volume9
Issue number1
DOIs
Publication statusPublished - 2001
Externally publishedYes

Fingerprint

harmonics
synthesis
vocoders
Vocoders
Speech intelligibility
intelligibility
acoustics
discontinuity
waveforms
Innovation
Acoustics
Acoustic waves

Keywords

  • Evaluation of synthesis technique and systems
  • Segmentai units and adjustment rules
  • Spectral analysis
  • Synthesis structure and systems

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

A Japanese TTS system based on multiform units and a speech modification algorithm with harmonics reconstruction. / Takano, Satoshi; Tanaka, Kimihito; Mizuno, Hideyuki; Abe, Masanobu.

In: IEEE Transactions on Speech and Audio Processing, Vol. 9, No. 1, 2001, p. 3-10.

Research output: Contribution to journalArticle

@article{a0063d6da81647af84c4e1394c8e8858,
title = "A Japanese TTS system based on multiform units and a speech modification algorithm with harmonics reconstruction",
abstract = "This paper proposes a new lext-to-speech (TTS) system that utilizes large numbers of speech segments to produce very natural and intelligible synthetic speech. There are two innovations; new multiform synthesis units and a new speech modification algorithm based on a vocoder that offers harmonics reconstruction. The multiform units make it possible to reduce acoustic discontinuities at concatenation points and unnatural sound by preparing synthesis units with various lengths and various Fo contours. The new speech modification algorithm, on the other hand, improves the quality of prosody modified speech. This algorithm is extremely effective in synthesizing speech whose prosodic parameters are quite different from those of synthesis units. Listening tests confirm that the new synthesis units yield speech with high intelligibility and naturalness, and that the new speech modification algorithm is superior to all other conventional vocoders and waveform domain algorithms including TD-PSOLA, especially when modifying the Fo frequency upward.",
keywords = "Evaluation of synthesis technique and systems, Segmentai units and adjustment rules, Spectral analysis, Synthesis structure and systems",
author = "Satoshi Takano and Kimihito Tanaka and Hideyuki Mizuno and Masanobu Abe",
year = "2001",
doi = "10.1109/89.890065",
language = "English",
volume = "9",
pages = "3--10",
journal = "IEEE Transactions on Speech and Audio Processing",
issn = "1558-7916",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

TY - JOUR

T1 - A Japanese TTS system based on multiform units and a speech modification algorithm with harmonics reconstruction

AU - Takano, Satoshi

AU - Tanaka, Kimihito

AU - Mizuno, Hideyuki

AU - Abe, Masanobu

PY - 2001

Y1 - 2001

N2 - This paper proposes a new lext-to-speech (TTS) system that utilizes large numbers of speech segments to produce very natural and intelligible synthetic speech. There are two innovations; new multiform synthesis units and a new speech modification algorithm based on a vocoder that offers harmonics reconstruction. The multiform units make it possible to reduce acoustic discontinuities at concatenation points and unnatural sound by preparing synthesis units with various lengths and various Fo contours. The new speech modification algorithm, on the other hand, improves the quality of prosody modified speech. This algorithm is extremely effective in synthesizing speech whose prosodic parameters are quite different from those of synthesis units. Listening tests confirm that the new synthesis units yield speech with high intelligibility and naturalness, and that the new speech modification algorithm is superior to all other conventional vocoders and waveform domain algorithms including TD-PSOLA, especially when modifying the Fo frequency upward.

AB - This paper proposes a new lext-to-speech (TTS) system that utilizes large numbers of speech segments to produce very natural and intelligible synthetic speech. There are two innovations; new multiform synthesis units and a new speech modification algorithm based on a vocoder that offers harmonics reconstruction. The multiform units make it possible to reduce acoustic discontinuities at concatenation points and unnatural sound by preparing synthesis units with various lengths and various Fo contours. The new speech modification algorithm, on the other hand, improves the quality of prosody modified speech. This algorithm is extremely effective in synthesizing speech whose prosodic parameters are quite different from those of synthesis units. Listening tests confirm that the new synthesis units yield speech with high intelligibility and naturalness, and that the new speech modification algorithm is superior to all other conventional vocoders and waveform domain algorithms including TD-PSOLA, especially when modifying the Fo frequency upward.

KW - Evaluation of synthesis technique and systems

KW - Segmentai units and adjustment rules

KW - Spectral analysis

KW - Synthesis structure and systems

UR - http://www.scopus.com/inward/record.url?scp=0035128144&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035128144&partnerID=8YFLogxK

U2 - 10.1109/89.890065

DO - 10.1109/89.890065

M3 - Article

AN - SCOPUS:0035128144

VL - 9

SP - 3

EP - 10

JO - IEEE Transactions on Speech and Audio Processing

JF - IEEE Transactions on Speech and Audio Processing

SN - 1558-7916

IS - 1

ER -