Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt

Hideyuki Mizuno, Masanobu Abe

Research output: Contribution to journalArticle

44 Citations (Scopus)

Abstract

This article presents a new algorithm used in order to convert the speech of one speaker so that it sounds like that of another speaker. This algorithm flexibly converts voice quality using two major technical developments. Firstly, the modification of formant frequencies and spectral intensity using piecewise linear voice conversion rules. This enables the control of spectrum parameters in detail. The conversion rules are generated automatically for any pair of speakers. The reliability of the conversion rules is guaranteed because they are statistically generated using training data. Secondly, this algorithm provides the ability to produce speech with the desired formant structure by controlling formant frequencies, formant bandwidths and spectral intensity. Speech is iteratively modified in order to achieve the specified formant structure. Listening tests prove that the proposed algorithm converts speaker individuality while maintaining high speech quality.

Original languageEnglish
Pages (from-to)153-164
Number of pages12
JournalSpeech Communication
Volume16
Issue number2
DOIs
Publication statusPublished - 1995
Externally publishedYes

Fingerprint

Voice Conversion
Tilt
Piecewise Linear
Convert
Voice Quality
Aptitude
individuality
technical development
Individuality
Bandwidth
Speech
Formant Frequencies
ability
Acoustic waves
Formants

Keywords

  • Formant frequency
  • Listening test
  • Piecewise linear
  • Spectral intensity
  • Spectrum tilt
  • Voice conversion

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Software
  • Modelling and Simulation
  • Linguistics and Language
  • Communication
  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology

Cite this

Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt. / Mizuno, Hideyuki; Abe, Masanobu.

In: Speech Communication, Vol. 16, No. 2, 1995, p. 153-164.

Research output: Contribution to journalArticle

@article{32c8ba6d62e44de19eb8bc214f32269a,
title = "Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt",
abstract = "This article presents a new algorithm used in order to convert the speech of one speaker so that it sounds like that of another speaker. This algorithm flexibly converts voice quality using two major technical developments. Firstly, the modification of formant frequencies and spectral intensity using piecewise linear voice conversion rules. This enables the control of spectrum parameters in detail. The conversion rules are generated automatically for any pair of speakers. The reliability of the conversion rules is guaranteed because they are statistically generated using training data. Secondly, this algorithm provides the ability to produce speech with the desired formant structure by controlling formant frequencies, formant bandwidths and spectral intensity. Speech is iteratively modified in order to achieve the specified formant structure. Listening tests prove that the proposed algorithm converts speaker individuality while maintaining high speech quality.",
keywords = "Formant frequency, Listening test, Piecewise linear, Spectral intensity, Spectrum tilt, Voice conversion",
author = "Hideyuki Mizuno and Masanobu Abe",
year = "1995",
doi = "10.1016/0167-6393(94)00052-C",
language = "English",
volume = "16",
pages = "153--164",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "2",

}

TY - JOUR

T1 - Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt

AU - Mizuno, Hideyuki

AU - Abe, Masanobu

PY - 1995

Y1 - 1995

N2 - This article presents a new algorithm used in order to convert the speech of one speaker so that it sounds like that of another speaker. This algorithm flexibly converts voice quality using two major technical developments. Firstly, the modification of formant frequencies and spectral intensity using piecewise linear voice conversion rules. This enables the control of spectrum parameters in detail. The conversion rules are generated automatically for any pair of speakers. The reliability of the conversion rules is guaranteed because they are statistically generated using training data. Secondly, this algorithm provides the ability to produce speech with the desired formant structure by controlling formant frequencies, formant bandwidths and spectral intensity. Speech is iteratively modified in order to achieve the specified formant structure. Listening tests prove that the proposed algorithm converts speaker individuality while maintaining high speech quality.

AB - This article presents a new algorithm used in order to convert the speech of one speaker so that it sounds like that of another speaker. This algorithm flexibly converts voice quality using two major technical developments. Firstly, the modification of formant frequencies and spectral intensity using piecewise linear voice conversion rules. This enables the control of spectrum parameters in detail. The conversion rules are generated automatically for any pair of speakers. The reliability of the conversion rules is guaranteed because they are statistically generated using training data. Secondly, this algorithm provides the ability to produce speech with the desired formant structure by controlling formant frequencies, formant bandwidths and spectral intensity. Speech is iteratively modified in order to achieve the specified formant structure. Listening tests prove that the proposed algorithm converts speaker individuality while maintaining high speech quality.

KW - Formant frequency

KW - Listening test

KW - Piecewise linear

KW - Spectral intensity

KW - Spectrum tilt

KW - Voice conversion

UR - http://www.scopus.com/inward/record.url?scp=0029256372&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029256372&partnerID=8YFLogxK

U2 - 10.1016/0167-6393(94)00052-C

DO - 10.1016/0167-6393(94)00052-C

M3 - Article

AN - SCOPUS:0029256372

VL - 16

SP - 153

EP - 164

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 2

ER -