Enhancing a glossectomy patient's speech via GMM-based voice conversion

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In this paper, we describe the use of a voice conversion algorithm for improving the intelligibility of speech by patients with articulation disorders caused by a wide glossectomy and/or segmental mandibulectomy. As a first trial, to demonstrate the difficulty of the task at hand, we implemented a conventional Gaussian mixture model (GMM)-based algorithm using a frame-by-frame approach. We compared voice conversion performance among normal speakers and one with an articulation disorder by measuring the number of training sentences, the number of GMM mixtures, and the variety of speaking styles of training speech. According to our experiment results, the mel-cepstrum (MC) distance was decreased by 40% in all pairs of speakers as compared with that of pre-conversion measures; however, at post-conversion, the MC distance between a pair of a glossectomy speaker and a normal speaker was 28% larger than that between pairs of normal speakers. The analysis of resulting spectrograms showed that the voice conversion algorithm successfully reconstructed high-frequency spectra in phonemes/h/,/t/,/k/,/ts/, and/ch/; we also confirmed improvements of speech intelligibility via informal listening tests.

Original languageEnglish
Title of host publication2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9789881476821
DOIs
Publication statusPublished - Jan 17 2017
Event2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 - Jeju, Korea, Republic of
Duration: Dec 13 2016Dec 16 2016

Other

Other2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016
CountryKorea, Republic of
CityJeju
Period12/13/1612/16/16

Fingerprint

Speech intelligibility
Experiments

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Signal Processing

Cite this

Tanaka, K., Hara, S., Abe, M., & Minagi, S. (2017). Enhancing a glossectomy patient's speech via GMM-based voice conversion. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016 [7820909] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2016.7820909

Enhancing a glossectomy patient's speech via GMM-based voice conversion. / Tanaka, Kei; Hara, Sunao; Abe, Masanobu; Minagi, Shougo.

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc., 2017. 7820909.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tanaka, K, Hara, S, Abe, M & Minagi, S 2017, Enhancing a glossectomy patient's speech via GMM-based voice conversion. in 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016., 7820909, Institute of Electrical and Electronics Engineers Inc., 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016, Jeju, Korea, Republic of, 12/13/16. https://doi.org/10.1109/APSIPA.2016.7820909
Tanaka K, Hara S, Abe M, Minagi S. Enhancing a glossectomy patient's speech via GMM-based voice conversion. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc. 2017. 7820909 https://doi.org/10.1109/APSIPA.2016.7820909
Tanaka, Kei ; Hara, Sunao ; Abe, Masanobu ; Minagi, Shougo. / Enhancing a glossectomy patient's speech via GMM-based voice conversion. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016. Institute of Electrical and Electronics Engineers Inc., 2017.
@inproceedings{c88a455b6dec49d6843e988a793f9652,
title = "Enhancing a glossectomy patient's speech via GMM-based voice conversion",
abstract = "In this paper, we describe the use of a voice conversion algorithm for improving the intelligibility of speech by patients with articulation disorders caused by a wide glossectomy and/or segmental mandibulectomy. As a first trial, to demonstrate the difficulty of the task at hand, we implemented a conventional Gaussian mixture model (GMM)-based algorithm using a frame-by-frame approach. We compared voice conversion performance among normal speakers and one with an articulation disorder by measuring the number of training sentences, the number of GMM mixtures, and the variety of speaking styles of training speech. According to our experiment results, the mel-cepstrum (MC) distance was decreased by 40{\%} in all pairs of speakers as compared with that of pre-conversion measures; however, at post-conversion, the MC distance between a pair of a glossectomy speaker and a normal speaker was 28{\%} larger than that between pairs of normal speakers. The analysis of resulting spectrograms showed that the voice conversion algorithm successfully reconstructed high-frequency spectra in phonemes/h/,/t/,/k/,/ts/, and/ch/; we also confirmed improvements of speech intelligibility via informal listening tests.",
author = "Kei Tanaka and Sunao Hara and Masanobu Abe and Shougo Minagi",
year = "2017",
month = "1",
day = "17",
doi = "10.1109/APSIPA.2016.7820909",
language = "English",
booktitle = "2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Enhancing a glossectomy patient's speech via GMM-based voice conversion

AU - Tanaka, Kei

AU - Hara, Sunao

AU - Abe, Masanobu

AU - Minagi, Shougo

PY - 2017/1/17

Y1 - 2017/1/17

N2 - In this paper, we describe the use of a voice conversion algorithm for improving the intelligibility of speech by patients with articulation disorders caused by a wide glossectomy and/or segmental mandibulectomy. As a first trial, to demonstrate the difficulty of the task at hand, we implemented a conventional Gaussian mixture model (GMM)-based algorithm using a frame-by-frame approach. We compared voice conversion performance among normal speakers and one with an articulation disorder by measuring the number of training sentences, the number of GMM mixtures, and the variety of speaking styles of training speech. According to our experiment results, the mel-cepstrum (MC) distance was decreased by 40% in all pairs of speakers as compared with that of pre-conversion measures; however, at post-conversion, the MC distance between a pair of a glossectomy speaker and a normal speaker was 28% larger than that between pairs of normal speakers. The analysis of resulting spectrograms showed that the voice conversion algorithm successfully reconstructed high-frequency spectra in phonemes/h/,/t/,/k/,/ts/, and/ch/; we also confirmed improvements of speech intelligibility via informal listening tests.

AB - In this paper, we describe the use of a voice conversion algorithm for improving the intelligibility of speech by patients with articulation disorders caused by a wide glossectomy and/or segmental mandibulectomy. As a first trial, to demonstrate the difficulty of the task at hand, we implemented a conventional Gaussian mixture model (GMM)-based algorithm using a frame-by-frame approach. We compared voice conversion performance among normal speakers and one with an articulation disorder by measuring the number of training sentences, the number of GMM mixtures, and the variety of speaking styles of training speech. According to our experiment results, the mel-cepstrum (MC) distance was decreased by 40% in all pairs of speakers as compared with that of pre-conversion measures; however, at post-conversion, the MC distance between a pair of a glossectomy speaker and a normal speaker was 28% larger than that between pairs of normal speakers. The analysis of resulting spectrograms showed that the voice conversion algorithm successfully reconstructed high-frequency spectra in phonemes/h/,/t/,/k/,/ts/, and/ch/; we also confirmed improvements of speech intelligibility via informal listening tests.

UR - http://www.scopus.com/inward/record.url?scp=85013858356&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85013858356&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2016.7820909

DO - 10.1109/APSIPA.2016.7820909

M3 - Conference contribution

AN - SCOPUS:85013858356

BT - 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -