LSTM-Based Robust Voicing Decision Applied to DNN-Based Speech Synthesis
- 作者: Pradeep R.1, Reddy M.K.2, Rao K.S.2
-
隶属关系:
- Advanced Technology Development Center
- Department of Computer Science and Engineering
- 期: 卷 53, 编号 4 (2019)
- 页面: 328-332
- 栏目: Article
- URL: https://journal-vniispk.ru/0146-4116/article/view/175840
- DOI: https://doi.org/10.3103/S0146411619040096
- ID: 175840
如何引用文章
详细
The quality of statistical parametric speech synthesis (SPSS) relies on voiced/unvoiced classification. Errors in voicing decision can contribute to significant degradation in speech quality. This paper proposes a robust voicing detection method based on power spectrum and long short term memory (LSTM) network for SPSS. The performance of the proposed method is evaluated using CMU Arctic, Keele and MIR-1K databases. Further, the effectiveness of the proposed method is analyzed for deep neural network (DNN)-based SPSS. The results show that the proposed method can better classify the voiced and unvoiced speech segments, which significantly improves the speech quality.
作者简介
R. Pradeep
Advanced Technology Development Center
编辑信件的主要联系方式.
Email: rpradeep@iitkgp.ac.in
印度, IIT Kharagpur, 721302
M. Reddy
Department of Computer Science and Engineering
Email: rpradeep@iitkgp.ac.in
印度, IIT Kharagpur, 721302
K. Rao
Department of Computer Science and Engineering
Email: rpradeep@iitkgp.ac.in
印度, IIT Kharagpur, 721302
补充文件
