Method of voice source coding with data compression based on the linear prediction model

V. V. Savchenko; Савченко В. В.; L. V. Savchenko; Савченко Л. В.

Method of voice source coding with data compression based on the linear prediction model

作者: Savchenko V.V.¹, Savchenko L.V.¹
隶属关系:
1. National Research University Higher School of Economics
期: 卷 74, 编号 3 (2025)
页面: 67-78
栏目: ACOUSTIC MEASUREMENTS
URL: https://journal-vniispk.ru/0368-1025/article/view/351190
ID: 351190

如何引用文章

全文:

开放存取

##reader.subscriptionAccessGranted##
受限制的访问

订阅存取

详细
作者简介
参考
补充文件
统计

详细

Within the framework of a dynamically developing direction of research in the field of acoustic measurements – analysis and evaluation of parameters of the excitation signal of acoustic oscillations in the vocal tract of a speaker – the problem of coding a voice source of speech with data compression based on a linear prediction model is considered. Using the criterion of minimum average the voice source power in the speech production process, the problem is reduced to real-time coding of the linear prediction error signal. A new method of voice coding has been developed: with clipping of the linear prediction error, which is not associated with computationally expensive procedures for measuring the initial phase and frequency of the fundamental tone of the speech signal. An example of its technical implementation in soft real-time mode is considered. A full-scale experiment was set up and carried out, during which a comparative analysis of the effectiveness of the proposed method and the widely used discrete cosine transform method was performed. It is shown that due to the weakening of data compression artifacts in the reconstructed speech signal, the accuracy of coding the voice source using the developed method is one and a half to two times higher, and there is no need to detect vowel sounds of speech and pauses in the speech signal. The obtained results will be useful in the development of new and modernization of existing systems and algorithms in the fields of automatic speech processing and synthesis, mobile speech communication, artificial intelligence and other applications of speech technologies with data compression based on the linear prediction model.

关键词

acoustic speech analysis, speech signal, vocal tract, linear prediction model, voice excitation

作者简介

V. Savchenko

National Research University Higher School of Economics

Email: vvsavchenko@yandex.ru
ORCID iD: 0000-0003-3045-3337

L. Savchenko

National Research University Higher School of Economics

Email: vvsavchenko@yandex.ru
ORCID iD: 0000-0002-2776-5471

参考

Rabiner L. R., Shafer R. W. Theory and Applications of Digital Speech Processing. Pearson, Boston (2010).
Li Y., Tao J., Erickson D., Liu B. and Akagi M. F0-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model. In: IEEE ACM Transactions on Audio, Speech, and Language Processing, 29, 3375–3383 (2021). https://doi.org/10.1109/TASLP.2021.3120585
Tokuda I. The source–filter theory of speech. Oxford Research Encyclopedia of Linguistics (2021). https://doi.org/10.1093/acrefore/9780199384655.013.894
Palaparthi A., Titze I. R. Analysis of glottal inverse filtering in the presence of source-filter interaction. Speech Communication, 123, 98-108 (2020). https://doi.org/10.1016/j.specom.2020.07.003
Gibson J. Mutual Information, the Linear Prediction Model and CELP Voice Codecs. Information, 10(5), 179 (2019). https://doi.org/10.3390/info10050179
Kim H. S. Linear predictive coding is all-pole resonance modeling, Center for Computer Research in Music and Acoustics, Stanford University (2023). https://ccrma.stanford.edu/~hskim08/lpc/
Ternström S. Special Issue on Current Trends and Future Directions in Voice Acoustics Measurement. Applied Sciences, 13(6), 3514, (2023). https://doi.org/10.3390/app13063514
Mishra J. & Sharma R. K. Vocal Tract Acoustic Measurements for Detection of Pathological Voice Disorders. Journal of Circuits, Systems and Computers, 33(10), 2450173 (2024). https://doi.org/10.1142/S0218126624501731
Савченко В. В., Савченко Л. В. Метод асинхронного анализа голосового источника речи на основе двухуровневой авторегрессионной модели речевого сигнала. Измерительная техника, (2), 55–62 (2024). https://doi.org/10.32446/0368-1025it. 2024-2-55-62
Kadiri S. R., Alku P. and Yegnanarayana B. Extraction and Utilization of Excitation Information of Speech: A Review. In: Proceedings of the IEEE, 109(12), 1920-1941 (2021). https://doi.org/10.1109/JPROC.2021.3126493
Arun M. S. & Sathidevi P. S. A Wideband Scalable Bit Rate Mixed Excitation Linear Prediction-Enhanced Speech Coder by Preserving Speaker-Specific Features. Circuits, Systems, and Signal Processing, 42, 1-27 (2023). https://doi.org/10.1007/s00034-022-02277-z
Al-Radhi M. S., Abdo O., Csapó T. G., Abdou Sh., Németh G., Fashal M. A continuous vocoder for statistical parametric speech synthesis and its evaluation using an audio-visual phonetically annotated Arabic corpus. Computer Speech & Language, 60, 101025 (2020). https://doi.org/10.1016/j.csl.2019.101025
Perrotin O., McLoughlin I. A Spectral Glottal Flow Model for Source-filter Separation of Speech. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 7160-7164 (2019). https://doi.org/10.1109/ICASSP.2019.8682625
Hansen J. H. L., Stauffer A., Xia W. Nonlinear waveform distortion: Assessment and detection of clipping on speech data and systems. Speech Communication, 134, 20-31 (2021). https://doi.org/10.1016/j.specom.2021.07.007
Schnell M., Ravelli E., B¨uthe J., Schlegel M., Tomasek A., Tschekalinskij A., Svedberg J. and Sehlstedt M. Lc3 and lc3plus: The new audio transmission standards for wireless communication. Audio Engineering Society Convention, 150, 104911 (2021). https://aes2.org/publications/elibrary-page/?id=21084
Ochoa-Dominguez H., Rao K. R. Discrete cosine transform. Boca Raton: CRC Press. (2019). https://doi.org/10.1201/9780203729854
Korse S., Pia N., Gupta K. and Fuchs G. PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech. In: ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 831-835 (2022). https://doi.org/10.1109/ICASSP43922.2022.9747733
Liu C. M., Hsu H. W. and Lee W. C. Compression Artifacts in Perceptual Audio Coding. In: IEEE Transactions on Audio, Speech, and Language Processing, 16(4), 681-695 (2008). https://doi.org/10.1109/TASL.2008.918979
Thiem N., Orescanin M. and Michael J. B. Reducing Artifacts in GAN Audio Synthesis. In: 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 1268-1275 (2020). https://doi.org/10.1109/ICMLA51294.2020.00199
Савченко В. В. Метод сравнительного тестирования параметрических оценок спектра мощности: спектральный анализ через синтез временно́го ряда. Измерительная техника, (6), 56–62 (2023). https://doi.org/10.32446/0368-1025it.2023-6-56-62
Savchenko A. V., Savchenko V. V. Scale-Invariant Modification of COSH Distance for Measuring Speech Signal Distortions in Real-Time Mode. Radioelectronics and Communications Systems, 64(6), 300–309 (2021). https://doi.org/10.3103/S0735272721060030
Marple S. L. Digital Spectral Analysis with Applications. 2-nd ed. Mineola, New York, Dover Publications (2019).
Савченко В. В. Мера различий речевых сигналов по тембру голоса. Измерительная техника, (10), 63-69 (2023). https://doi.org/10.32446/0368-1025it.2023-10-63-69
Рабинер Л., Гоулд Б. Теория и применение цифровой обработки сигналов. Перевод с англ. А. Л. Зайцева, Э. Г. Назаренко, Н. Н. Тетекина. М.: Мир (1978).
Tan L., Jiang J. Waveform Quantization and Compression. In: Digital Signal Processing (Third Edition), Academic Press, 475-527 (2019). https://doi.org/10.1016/B978-0-12-815071-9.00010-5
Савченко В. В., Савченко Л. В. Метод корректировки коэффициентов линейного предсказания для систем цифровой обработки речи со сжатием данных на основе авторегрессионной модели голосового сигнала. Радиотехника и электроника, 69(4), 339-347 (2024). https://doi.org/10.31857/S0033849424040056
Chen J. H., Thyssen J. Analysis-by-Synthesis Speech Coding. In: Springer Handbook of Speech Processing, Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_17
Kolbæk M., Tan Z. H., Jensen S. H., Jensen J. On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement. ACM Transactions on Audio, Speech, and Language Processing, 28, 8966946, 825-838 (2020). https://doi.org/10.1109/TASLP.2020.2968738
Zalazar I. A, Alzamendi G. A., Schlotthauer G. Symmetric and asymmetric Gaussian weighted linear prediction for voice inverse filtering. Speech Communication, 159, 103057 (2024) https://doi.org/10.1016/j.specom.2024.103057
Yi H. & Philipos L. Evaluation of Objective Quality Measures for Speech Enhancement. Audio, Speech, and Language Processing, IEEE Transactions, 16, 229 – 238 (2008). https://doi.org/10.1109/TASL.2007.911054
Benesty J., Chen J., Huang Y. Linear Prediction. In: Springer Handbook of Speech Processing. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-49127-9_7
Савченко В. В., Савченко А. В. Метод измерения искажений речевого сигнала, переданного по каналу связи в биометрическую систему идентификации. Измерительная техника, (11), 65-72 (2020). https://doi.org/10.32446/0368-1025it.2020-11-65-72
Savchenko V. V. Acoustic Variability of Voice Signal as Factor of Information Security for Automatic Speech Recognition Systems with Tuning to User Voice. Radioelectronics and Communications Systems, 63, 532–542 (2020). https://doi.org/10.3103/S0735272720100039
Molla M. K. I., Hirose K. & Hasan M. K. Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD. Pattern Analysis and Applications, 19, 139–144 (2016). https://doi.org/10.1007/s10044-015-0449-3
Rajmic P., Bertin N., Emiya V., Holighaus N. and Ozerov A. Editorial: Reconstruction of Audio From Incomplete or Highly Degraded Observations. In: IEEE Journal of Selected Topics in Signal Processing, 15(1), 2–4 (2021). https://doi.org/10.1109/JSTSP.2021.3052087

补充文件

附件文件

动作

1. JATS XML

下载

用户名
密码
记住我

忘记您的密码?	注册

用户名
密码
记住我

忘记您的密码?	注册