Speaker-Specific Method of Spoofing Attack Detection Based on Anomaly Detection
- Authors: Evsyukov M.V1
-
Affiliations:
- Kuban State Technological University
- Issue: Vol 24, No 1 (2025)
- Pages: 193-228
- Section: Information security
- URL: https://journal-vniispk.ru/2713-3192/article/view/278227
- DOI: https://doi.org/10.15622/ia.24.1.8
- ID: 278227
Cite item
Full Text
Abstract
About the authors
M. V Evsyukov
Kuban State Technological University
Email: michael.evsyukov@gmail.com
Moskovskaya St. 2
References
- Bai Z., Zhang X.-L. Speaker Recognition Based on Deep Learning: An overview // Neural Networks. 2021. vol. 140. pp. 65–99. doi: 10.1016/j.neunet.2021.03.004.
- Wang X., Yamagishi J. A Practical Guide to Logical Access Voice Presentation Attack Detection // Frontiers in Fake Media Generation and Detection. Singapore: Springer. 2022. pp. 169–214. doi: 10.1007/978-981-19-1524-6_8.
- ГОСТ Р 58624.1-2019. Информационные технологии. Биометрия. Обнаружение атаки на биометрическое предъявление. Часть 1. Структура. М.: Стандартинформ, 2019. 16 с.
- Chettri B., Sturm B.L. A Deeper Look at Gaussian Mixture Model Based Anti-Spoofing Systems // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2018. pp. 5159–5163. doi: 10.1109/ICASSP.2018.8461467.
- Wei L., Long Y., Wei H., Li Y. New Acoustic Features for Synthetic and Replay Spoofing Attack Detection // Symmetry. 2022. vol. 14. no. 2. doi: 10.3390/sym14020274.
- Balamurali B.T., Lin K.E., Lui S., Chen J.-M., Herremans D. Toward Robust Audio Spoofing Detection: A Detailed Comparison of Traditional and Learned Features // IEEE Access. 2019. vol. 7. pp. 84229–84241. doi: 10.1109/ACCESS.2019.2923806.
- Марковников Н.М., Кипяткова И.С. Аналитический обзор интегральных систем распознавания речи // Труды СПИИРАН. 2018. № 3(58). C. 77–110. doi: 10.15622/sp.58.4.
- Hua G., Teoh A.B.J., Zhang H. Towards End-To-End Synthetic Speech Detection // IEEE Signal Processing Letters. 2021. vol. 28. pp. 1265–1269. doi: 10.1109/LSP.2021.3089437.
- Wang X., Delgado H., Tak H., Jung J., Shim H., Todisco M., Kukanov I., Liu X., Sahidullah M., Kinnunen T., Evans N., Lee K.A., Yamagishi J. ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale // arxiv preprint: arXiv:2408.08739v1. 2024.
- Novoselov S., Kozlov A., Lavrentyeva G., Simonchik K., Shchemelinin V. STC Anti-spoofing Systems for the ASVspoof 2015 Challenge // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2016. pp. 5475–5479. doi: 10.1109/ICASSP.2016.7472724.
- Lavrentyeva G., Novoselov S., Malykh E., Kozlov A., Kudashev O., Shchemelinin V. Audio Replay Attack Detection with Deep Learning Frameworks // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2017. pp. 82–86. doi: 10.21437/Interspeech.2017-360.
- Lavrentyeva G., Novoselov S., Tseren A., Volkova M., Gorlanov A., Kozlov A. STC Antispoofing Systems for the ASVspoof2019 Challenge // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2019. pp. 1033–1037. doi: 10.21437/Interspeech.2019-1768.
- Tomilov A., Svishchev A., Volkova M., Chirkovskiy A., Kondratev A., Lavrentyeva G. STC Antispoofing Systems for the ASVspoof2021 Challenge // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2021. pp. 61–67. doi: 10.21437/ASVSPOOF.2021-10.
- Suthokumar G., Sriskandaraja K., Sethu V., Ambikairajah E., Li H. An Analysis of Speaker Dependent Models in Replay Detection // APSIPA Transactions on Signal and Information Processing. 2020. vol. 9. no. 1. doi: 10.1017/ATSIP.2020.9.
- Евсюков М.В., Путято М.М., Макарян А.С. Исследование различимости подлинного и синтезированного голоса дикторов // Вопросы кибербезопасности. 2024. № 2(60). С. 44–52. doi: 10.21681/2311-3456-2024-2-44-52.
- Евсюков М.В., Путято М.М., Макарян А.С., Черкасов А.Н. Оценка точности субъектозависимого подхода к обнаружению синтезированного голоса // Вестник Воронежского государственного университета. Серия: Системный анализ и информационные технологии. 2024. № 1. С. 77–93. doi: 10.17308/sait/1995-5499/2024/1/77-93.
- Castan D., Rahman M.H., Bakst S., Cobo-Kroenke C., McLaren M., Graciarena M., Lawson A. Speaker-Targeted Synthetic Speech Detection // Proc. of The Speaker and Language Recognition Workshop (Odyssey 2022). 2022. pp. 62–69. doi: 10.21437/Odyssey.2022-9.
- Zhang Y., Jiang F., Duan Z. One-Class Learning Towards Synthetic Voice Spoofing Detection // IEEE Signal Processing Letters. 2021. vol. 28. pp. 937–941. doi: 10.1109/LSP.2021.3076358.
- Brummer N., Swart A., Mosner L., Silnova A., Plchot O., Stafylakis T., Burget L. Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2022. pp. 1446–1450. doi: 10.21437/Interspeech.2022-731.
- Liu X., Sahidullah M., Lee K.A., Kinnunen T. Speaker-Aware Anti-spoofing // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2023. pp. 2498–2502. doi: 10.21437/Interspeech.2023-1323.
- Jung J.W. Heo H.S., Tak H., Shim H.J., Chung J.S., Lee B.J., Yu H.J., Evans N. AASIST: Audio Anti-Spoofing Using Integrated Spectro-Temporal Graph Attention Networks // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022. pp. 6367–6371. doi: 10.1109/ICASSP43922.2022.9747766.
- Fatemifar S., Arashloo S.R., Awais M., Kittler J. Client-Specific Anomaly Detection for Face Presentation Attack Detection // Pattern Recognition. 2020. vol. 112. no. 8. doi: 10.1016/j.patcog.2020.107696.
- Seliya N., Zadeh A.A., Khoshgoftaar T.M. A Literature Review on One-Class Classification and its Potential Applications in Big Data // Journal of Big Data. 2021. vol. 8. no. 1. doi: 10.1186/s40537-021-00514-x.
- Khan S., Madden M. A Survey of Recent Trends in One Class Classification // Artificial Intelligence and Cognitive Science, Lecture Notes in Computer Science. 2009. vol. 6206. pp. 188–197. doi: 10.1007/978-3-642-17080-5_21.
- Villalba J., Miguel A., Ortega A., Lleida E. Spoofing Detection with DNN and One-Class SVM for the ASVspoof 2015 Challenge // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2015. pp. 2067–2071. doi: 10.21437/interspeech.2015-468.
- Amorim L.B.V., Cavalcanti G.D.C., Cruz R.M.O. The Choice of Scaling Technique Matters for Classification Performance // Applied Soft Computing. 2023. vol. 133. doi: 10.1016/j.asoc.2022.109924.
- Wang C., Xu R., Xu S., Meng W., Zhang X. CNDesc: Cross Normalization for Local Descriptors Learning // IEEE Transactions on Multimedia. 2022. vol. 99. doi: 10.1109/TMM.2022.3169331.
- Dorabiala O., Aravkin A.Y., Kutz J.N. Ensemble Principal Component Analysis // IEEE Access. 2024. vol. 12. pp. 6663–6671. doi: 10.1109/ACCESS.2024.3350984.
- Tak H., Todisco M., Wang X., Jung J., Yamagishi J., Evans N. Automatic Speaker Verification Spoofing and Deepfake Detection Using Wav2vec 2.0 and Data Augmentation // Proc. of The Speaker and Language Recognition Workshop (Odyssey 2022). 2022. pp. 112–119. doi: 10.21437/Odyssey.2022-16.
- Wang X. et al. ASVspoof 2019: A Large-Scale Public Database of Synthesized, Converted and Replayed Speech // Computer Speech & Language. 2020. vol. 64. doi: 10.1016/j.csl.2020.101114.
- Yamagishi J., Wang X., Todisco M., Sahidullah M., Patino J., Nautsch A., Liu X., Lee K.A., Kinnunen T., Evans N., Delgado H. ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2021. pp. 47–54. doi: 10.21437/asvspoof.2021-8.
- Ge W., Tak H., Todisco M., Evans N. On the Potential of Jointly-Optimised Solutions to Spoofing Attack Detection and Automatic Speaker Verification // Proceedings of the 6th International Conference, IberSPEECH. 2022. pp. 51–55. doi: 10.21437/iberspeech.2022-11.
- Ding S., Chen T., Gong X., Zha W., Wang Z. AutoSpeech: Neural Architecture Search for Speaker Recognition // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2020. pp. 916–920. doi: 10.21437/Interspeech.2020-1258.
- Xie W., Nagrani A., Chung J.S., Zisserman A. Utterance-Level Aggregation for Speaker Recognition in the Wild // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019. pp. 5791–5795. doi: 10.1109/ICASSP.2019.8683120.
- Ravanelli M., Bengio Y. Speaker Recognition from Raw Waveform with SincNet // IEEE Spoken Language Technology Workshop (SLT). 2018. pp. 1021–1028. doi: 10.1109/SLT.2018.8639585.
- Jung J.W., Kim Y., Heo H.S., Lee B.-J., Kwon Y., Son Chung J.S. Pushing the Limits of Raw Waveform Speaker Recognition // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2022. pp. 2228–2232. doi: 10.21437/Interspeech.2022-126.
- Nagraniy A., Chungy J.S., Zisserman A. VoxCeleb: A large-scale speaker identification dataset // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2017. pp. 2616–2620. doi: 10.21437/Interspeech.2017-950.
- Chung J.S., Nagrani A., Zisserman A. VoxCeleb2: Deep Speaker Recognition // Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech. 2018. pp. 1086–1090. doi: 10.21437/Interspeech.2018-1929.
- Panayotov V., Chen G., Povey D., Khudanpur S. LibriSpeech: An ASR Corpus Based on Public Domain Audio Books // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2015. pp. 5206–5210. doi: 10.1109/ICASSP.2015.7178964.
- Kong Q., Cao Y., Iqbal T., Wang Y., Wang W., Plumbley M.D. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition // IEEE/ACM Transactions on Audio Speech and Language Processing. 2020. vol. 28. pp. 2880–2894. doi: 10.1109/TASLP.2020.3030497.
- Gemmeke G.F., Ellis D.P.W., Freedman D., Jansen A., Lawrence W., Moore R.C. Audio Set: An Ontology and Human-Labeled Dataset for Audio Events // IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2017. pp. 776–780. doi: 10.1109/ICASSP.2017.7952261.
- Hosna A., Merry E., Gyalmo J., Alom Z., Aung Z., Azim M.A. Transfer Learning: A Friendly Introduction // Journal of Big Data. 2022. vol. 9. no. 1. doi: 10.1186/s40537-022-00652-w.
- Januzaj Y., Luma A. Cosine Similarity – A Computing Approach to Match Similarity Between Higher Education Programs and Job Market Demands Based on Maximum Number of Common Words // International Journal of Emerging Technologies in Learning. 2022. vol. 17. no. 12. pp. 258–268. doi: 10.3991/ijet.v17i12.30375.
- Ghorbani H. Mahalanobis Distance and its Application for Detecting Multivariate Outliers // Facta Universitatis, Series: Mathematics and Informatics. 2019. vol. 34. no. 3. pp. 583–595. doi: 10.22190/fumi1903583g.
- Alegre F., Amehraye A., Evans N. A One-Class Classification Approach to Generalised Speaker Verification Spoofing Countermeasures Using Local Binary Patterns // IEEE 6th International Conference on Biometrics: Theory, Applications and Systems (BTAS). 2013. pp. 1–8. doi: 10.1109/BTAS.2013.6712706.
- Scrucca L. Entropy-Based Anomaly Detection for Gaussian Mixture Modeling // Algorithms. 2023. vol. 16. no. 4. doi: 10.3390/a16040195.
- Reynolds D.A., Quatieri T.F., Dunn R.B. Speaker Verification Using Adapted Gaussian Mixture Models // Digital Signal Processing: A Review Journal. 2000. vol. 10. no. 1-3. pp. 19–41. doi: 10.1006/dspr.1999.0361.
- Liu F.T., Ting K.M., Zhou Z.H. Isolation forest // Proceedings of the Eighth IEEE International Conference on Data Mining (ICDM). 2008. pp. 413–422. doi: 10.1109/ICDM.2008.17.
- Hao B., Hei X. Voice Liveness Detection for Medical Devices // Design and Implementation of Healthcare Biometric Systems. 2019. pp. 109–136. doi: 10.4018/978-1-5225-7525-2.ch005.
- Kinnunen T., Lee K.A., Delgado H., Evans N., Todisco M., Sahidullah M., Yamagishi J., Reyonolds D.A. t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification // Proc. The Speaker and Language Recognition Workshop (Odyssey 2018), 2018. pp. 312–319.
- Hazra A. Using the Confidence Interval Confidently // Journal of Thoracic Disease. 2017. vol. 9. no. 10. doi: 10.21037/jtd.2017.09.14.
- Martin A., Doggington G., Kamm T., Ordowski M. Przybocki M. The DET curve in assessment of detection task performance // Proceedings of the 5th European Conference on Speech Communication and Technology, Eurospeech (ISCA). 1997. pp. 1895–1898. doi: 10.21437/Eurospeech.1997-504.
Supplementary files
