Speaker dependent approach for enhancing a glossectomy patient's speech via GMM-based voice conversion

Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shougo Minagi

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In this paper, using GMM-based voice conversion algorithm, we propose to generate speaker-dependent mapping functions to improve the intelligibility of speech uttered by patients with a wide glossectomy. The speaker-dependent approach enables to generate the mapping functions that reconstruct missing spectrum features of speech uttered by a patient without having influences of a speaker's factor. The proposed idea is simple, i.e., to collect speech uttered by a patient before and after the glossectomy, but in practice it is hard to ask patients to utter speech just for developing algorithms. To confirm the performance of the proposed approach, in this paper, in order to simulate glossectomy patients, we fabricated an intraoral appliance which covers lower dental arch and tongue surface to restrain tongue movements. In terms of the Mel-frequency cepstrum (MFC) distance, by applying the voice conversion, the distances were reduced by 25% and 42% for speakerdependent case and speaker-independent case, respectively. In terms of phoneme intelligibility, dictation tests revealed that speech reconstructed by speaker-dependent approach almost always showed better performance than the original speech uttered by simulated patients, while speaker-independent approach did not.

Original languageEnglish
Pages (from-to)3384-3388
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
Publication statusPublished - Jan 1 2017

Fingerprint

Voice Conversion
Dependent
Speech intelligibility
Cepstrum
Arch
Speech
Arches
Cover

Keywords

  • Glossectomy
  • Speech intelligibility
  • Voice conversion

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Cite this

@article{a398f3fac29243ed8a27d189ac2a745f,
title = "Speaker dependent approach for enhancing a glossectomy patient's speech via GMM-based voice conversion",
abstract = "In this paper, using GMM-based voice conversion algorithm, we propose to generate speaker-dependent mapping functions to improve the intelligibility of speech uttered by patients with a wide glossectomy. The speaker-dependent approach enables to generate the mapping functions that reconstruct missing spectrum features of speech uttered by a patient without having influences of a speaker's factor. The proposed idea is simple, i.e., to collect speech uttered by a patient before and after the glossectomy, but in practice it is hard to ask patients to utter speech just for developing algorithms. To confirm the performance of the proposed approach, in this paper, in order to simulate glossectomy patients, we fabricated an intraoral appliance which covers lower dental arch and tongue surface to restrain tongue movements. In terms of the Mel-frequency cepstrum (MFC) distance, by applying the voice conversion, the distances were reduced by 25{\%} and 42{\%} for speakerdependent case and speaker-independent case, respectively. In terms of phoneme intelligibility, dictation tests revealed that speech reconstructed by speaker-dependent approach almost always showed better performance than the original speech uttered by simulated patients, while speaker-independent approach did not.",
keywords = "Glossectomy, Speech intelligibility, Voice conversion",
author = "Kei Tanaka and Sunao Hara and Masanobu Abe and Masaaki Sato and Shougo Minagi",
year = "2017",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2017-841",
language = "English",
volume = "2017-August",
pages = "3384--3388",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Speaker dependent approach for enhancing a glossectomy patient's speech via GMM-based voice conversion

AU - Tanaka, Kei

AU - Hara, Sunao

AU - Abe, Masanobu

AU - Sato, Masaaki

AU - Minagi, Shougo

PY - 2017/1/1

Y1 - 2017/1/1

N2 - In this paper, using GMM-based voice conversion algorithm, we propose to generate speaker-dependent mapping functions to improve the intelligibility of speech uttered by patients with a wide glossectomy. The speaker-dependent approach enables to generate the mapping functions that reconstruct missing spectrum features of speech uttered by a patient without having influences of a speaker's factor. The proposed idea is simple, i.e., to collect speech uttered by a patient before and after the glossectomy, but in practice it is hard to ask patients to utter speech just for developing algorithms. To confirm the performance of the proposed approach, in this paper, in order to simulate glossectomy patients, we fabricated an intraoral appliance which covers lower dental arch and tongue surface to restrain tongue movements. In terms of the Mel-frequency cepstrum (MFC) distance, by applying the voice conversion, the distances were reduced by 25% and 42% for speakerdependent case and speaker-independent case, respectively. In terms of phoneme intelligibility, dictation tests revealed that speech reconstructed by speaker-dependent approach almost always showed better performance than the original speech uttered by simulated patients, while speaker-independent approach did not.

AB - In this paper, using GMM-based voice conversion algorithm, we propose to generate speaker-dependent mapping functions to improve the intelligibility of speech uttered by patients with a wide glossectomy. The speaker-dependent approach enables to generate the mapping functions that reconstruct missing spectrum features of speech uttered by a patient without having influences of a speaker's factor. The proposed idea is simple, i.e., to collect speech uttered by a patient before and after the glossectomy, but in practice it is hard to ask patients to utter speech just for developing algorithms. To confirm the performance of the proposed approach, in this paper, in order to simulate glossectomy patients, we fabricated an intraoral appliance which covers lower dental arch and tongue surface to restrain tongue movements. In terms of the Mel-frequency cepstrum (MFC) distance, by applying the voice conversion, the distances were reduced by 25% and 42% for speakerdependent case and speaker-independent case, respectively. In terms of phoneme intelligibility, dictation tests revealed that speech reconstructed by speaker-dependent approach almost always showed better performance than the original speech uttered by simulated patients, while speaker-independent approach did not.

KW - Glossectomy

KW - Speech intelligibility

KW - Voice conversion

UR - http://www.scopus.com/inward/record.url?scp=85039151947&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039151947&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2017-841

DO - 10.21437/Interspeech.2017-841

M3 - Article

AN - SCOPUS:85039151947

VL - 2017-August

SP - 3384

EP - 3388

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -