DNN-based voice conversion with auxiliary phonemic information to improve intelligibility of glossectomy patients' speech

Hiroki Murakami, Sunao Hara, Masanobu Abe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose using phonemic information in addition to acoustic features to improve the intelligibility of speech uttered by patients with articulation disorders caused by a wide glossectomy. Our previous studies showed that voice conversion algorithm improves the quality of glossectomy patients' speech. However, losses in acoustic features of glossectomy patients' speech are so large that the quality of the reconstructed speech is low. To solve this problem, we explored potentials of several additional information to improve speech intelligibility. One of the candidates is phonemic information, more specifically Phoneme Labels as Auxiliary input (PLA). To combine both acoustic features and PLA, we employed a DNN-based algorithm. PLA is represented by a kind of one-of-k vector, i.e., PLA has a weight value (<1.0) that gradually changes in time axis, whereas one-of-k has a binary value (0 or 1). The results showed that the proposed algorithm reduced the mel-frequency cepstral distortion for all phonemes, and almost always improved intelligibility. Notably, the intelligibility was largely improved in phonemes /s/ and /z/, mainly because the tongue is used to sustain constriction to produces these phonemes. This indicates that PLA works well to compensate the lack of a tongue.

Original languageEnglish
Title of host publication2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages138-142
Number of pages5
ISBN (Electronic)9781728132488
DOIs
Publication statusPublished - Nov 2019
Event2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China
Duration: Nov 18 2019Nov 21 2019

Publication series

Name2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
CountryChina
CityLanzhou
Period11/18/1911/21/19

ASJC Scopus subject areas

  • Information Systems

Fingerprint Dive into the research topics of 'DNN-based voice conversion with auxiliary phonemic information to improve intelligibility of glossectomy patients' speech'. Together they form a unique fingerprint.

  • Cite this

    Murakami, H., Hara, S., & Abe, M. (2019). DNN-based voice conversion with auxiliary phonemic information to improve intelligibility of glossectomy patients' speech. In 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 (pp. 138-142). [9023168] (2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPAASC47483.2019.9023168