Naturalness improvement algorithm for reconstructed glossectomy patient's speech using spectral differential modification in voice conversion

Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato, Shougo Minagi

Research output: Contribution to journalConference article

Abstract

In this paper, we propose an algorithm to improve the naturalness of the reconstructed glossectomy patient's speech that is generated by voice conversion to enhance the intelligibility of speech uttered by patients with a wide glossectomy. While existing VC algorithms make it possible to improve intelligibility and naturalness, the result is still not satisfying. To solve the continuing problems, we propose to directly modify the speech waveforms using a spectrum differential. The motivation is that glossectomy patients mainly have problems in their vocal tract, not in their vocal cords. The proposed algorithm requires no source parameter extractions for speech synthesis, so there are no errors in source parameter extractions and we are able to make the best use of the original source characteristics. In terms of spectrum conversion, we evaluate with both GMM and DNN. Subjective evaluations show that our algorithm can synthesize more natural speech than the vocoder-based method. Judging from observations of the spectrogram, power in high-frequency bands of fricatives and stops is reconstructed to be similar to that of natural speech.

Original languageEnglish
Pages (from-to)2464-2468
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
Publication statusPublished - Jan 1 2018
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: Sep 2 2018Sep 6 2018

Fingerprint

Voice Conversion
Parameter extraction
Spectrogram
Speech Synthesis
Subjective Evaluation
Speech intelligibility
Speech synthesis
Waveform
Frequency bands
Speech
Spectrality
Naturalness
Evaluate

Keywords

  • Glossectomy
  • Neural network
  • Spectral differential
  • Speech intelligibility
  • Voice conversion

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{6dd6436aa2184c70a025b5d0088b0a45,
title = "Naturalness improvement algorithm for reconstructed glossectomy patient's speech using spectral differential modification in voice conversion",
abstract = "In this paper, we propose an algorithm to improve the naturalness of the reconstructed glossectomy patient's speech that is generated by voice conversion to enhance the intelligibility of speech uttered by patients with a wide glossectomy. While existing VC algorithms make it possible to improve intelligibility and naturalness, the result is still not satisfying. To solve the continuing problems, we propose to directly modify the speech waveforms using a spectrum differential. The motivation is that glossectomy patients mainly have problems in their vocal tract, not in their vocal cords. The proposed algorithm requires no source parameter extractions for speech synthesis, so there are no errors in source parameter extractions and we are able to make the best use of the original source characteristics. In terms of spectrum conversion, we evaluate with both GMM and DNN. Subjective evaluations show that our algorithm can synthesize more natural speech than the vocoder-based method. Judging from observations of the spectrogram, power in high-frequency bands of fricatives and stops is reconstructed to be similar to that of natural speech.",
keywords = "Glossectomy, Neural network, Spectral differential, Speech intelligibility, Voice conversion",
author = "Hiroki Murakami and Sunao Hara and Masanobu Abe and Masaaki Sato and Shougo Minagi",
year = "2018",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2018-1239",
language = "English",
volume = "2018-September",
pages = "2464--2468",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Naturalness improvement algorithm for reconstructed glossectomy patient's speech using spectral differential modification in voice conversion

AU - Murakami, Hiroki

AU - Hara, Sunao

AU - Abe, Masanobu

AU - Sato, Masaaki

AU - Minagi, Shougo

PY - 2018/1/1

Y1 - 2018/1/1

N2 - In this paper, we propose an algorithm to improve the naturalness of the reconstructed glossectomy patient's speech that is generated by voice conversion to enhance the intelligibility of speech uttered by patients with a wide glossectomy. While existing VC algorithms make it possible to improve intelligibility and naturalness, the result is still not satisfying. To solve the continuing problems, we propose to directly modify the speech waveforms using a spectrum differential. The motivation is that glossectomy patients mainly have problems in their vocal tract, not in their vocal cords. The proposed algorithm requires no source parameter extractions for speech synthesis, so there are no errors in source parameter extractions and we are able to make the best use of the original source characteristics. In terms of spectrum conversion, we evaluate with both GMM and DNN. Subjective evaluations show that our algorithm can synthesize more natural speech than the vocoder-based method. Judging from observations of the spectrogram, power in high-frequency bands of fricatives and stops is reconstructed to be similar to that of natural speech.

AB - In this paper, we propose an algorithm to improve the naturalness of the reconstructed glossectomy patient's speech that is generated by voice conversion to enhance the intelligibility of speech uttered by patients with a wide glossectomy. While existing VC algorithms make it possible to improve intelligibility and naturalness, the result is still not satisfying. To solve the continuing problems, we propose to directly modify the speech waveforms using a spectrum differential. The motivation is that glossectomy patients mainly have problems in their vocal tract, not in their vocal cords. The proposed algorithm requires no source parameter extractions for speech synthesis, so there are no errors in source parameter extractions and we are able to make the best use of the original source characteristics. In terms of spectrum conversion, we evaluate with both GMM and DNN. Subjective evaluations show that our algorithm can synthesize more natural speech than the vocoder-based method. Judging from observations of the spectrogram, power in high-frequency bands of fricatives and stops is reconstructed to be similar to that of natural speech.

KW - Glossectomy

KW - Neural network

KW - Spectral differential

KW - Speech intelligibility

KW - Voice conversion

UR - http://www.scopus.com/inward/record.url?scp=85054996045&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054996045&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-1239

DO - 10.21437/Interspeech.2018-1239

M3 - Conference article

AN - SCOPUS:85054996045

VL - 2018-September

SP - 2464

EP - 2468

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -