Basic Algorithm for Automatic Spelling Correction of Russian Texts: Development, Evaluation and Prospects
- Authors: Isaeva E.V.1, Safarbekov B.Z.2
-
Affiliations:
- Perm State University
- National University of Science and Technology "MISIS"
- Issue: No 1 (68) (2025)
- Pages: 91-108
- Section: Computer science
- URL: https://journal-vniispk.ru/1993-0550/article/view/326435
- DOI: https://doi.org/10.17072/1993-0550-2025-1-91-108
- EDN: https://elibrary.ru/ccjape
- ID: 326435
Cite item
Full Text
Abstract
About the authors
E. V. Isaeva
Perm State University
Email: ekaterinaisae@psu.ru
Scopus Author ID: 57204498718
ResearcherId: O-6777-2015
Perm
B. Z. Safarbekov
National University of Science and Technology "MISIS"
Email: behruzsafarbekov3@gmail.com
Moscow
References
- Zukarnain, N., Abbas, B. S., Wayan, S., Trisetyarso, A. and Kang, C. H. (2019), "Spelling Checker Algorithm Methods for Many Languages", in: 2019 International Conference on Information Management and Technology (ICIMTech). IEEE, pp. 198-201. https://doi.org/10.1109/ICIMTech.2019.8843801.
- Hamrouni, B. M. (1994), "Logic compression of dictionaries for multilingual spelling checkers, in: Proceedings of the 15th Conference on Computational Linguistics", Association for Computational Linguistics, Morristown, NJ, USA, p. 292. https://doi.org/10.3115/991886.991936.
- Lokhande, H. A., Kinage, L. J., Kolunkar, P. M., Salunkhe, J. M. and Kale, S. (2024), "Enhancing Text Quality with Bi-LSTM: An Approach for Automated Spelling and Grammar Correction", in: 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), IEEE, pp. 01-07, https://doi.org/10.1109/ADICS58448.2024.10533521.
- Mangu, L. and Brill, E. (1997), "Automatic Rule Acquisition for Spelling Correction", in: Proceedings of the Fourteenth International Conference on Machine Learning (ICML '97), Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp. 187-194.
- Davrondjon, G. and Janowski, T. (2002), "Developing a Spell-Checker for Tajik Using RAISE", in: George, C., Miao, H. (Eds.), Formal Methods and Software Engineering, ICFEM 2002. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, pp. 401-405. https://doi.org/10.1007/3-540-36103-0_41.
- Atawy, S. M. El, and ElGhany, A. A. (2018), "Automatic Spelling Correction based on n-Gram Model". Int J Comput Appl 182, pp. 5-9. https://doi.org/10.5120/ijca2018917724.
- Chen, K.-Y., Wang, H.-M. and Chen, H.-H. (2015), "A Probabilistic Framework for Chinese Spelling Check", ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 14, no. 4, pp. 1-17. https://doi.org/10.1145/2826234.
- Sasu, L. (2011), "A Probabilistic Model for Spelling Correction", Bulletin of the Transilvania University of Brasov, Series III: Mathematics, Informatics, Physics, vol. 4(53), no. 2, pp. 141-146.
- Kashyap, R. L. and Oommen, B. J. (1984), "Spelling correction using probabilistic methods", Pattern Recognit, Lett, vol. 2, no. 3, pp. 147-154. https://doi.org/10.1016/0167-8655(84)90038-2.
- Chen, S. F. (1996), "Building Probabilistic Models for Natural Language".
- Flachs, S., Lacroix, O. and Søgaard, A. (2019), "Noisy Channel for Low Resource Grammatical Error Correction", in: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 191-196. https://doi.org/10.18653/v1/W19-4420.
- Li, Y., Anastasopoulos, A. and Black, A. W. (2020), "Towards Minimal Supervision BERT-Based Grammar Error Correction (Student Abstract)", Proceedings of the AAAI Conference on Artificial Intelligence vol. 34, no. 10, pp. 13859-13860. https://doi.org/10.1609/aaai.v34i10.7202.
- Khabutdinov, I. A., et al. (2024), "RuGECToR: Rule-Based Neural Network Model for Russian Language Grammatical Error Correction", Programming and Computer Software. Pleiades Publishing, vol. 50, no. 4, pp. 315-321.
- Martynov, N. et al. (2023), "A Methodology for Generative Spelling Correction via Natural Spelling Errors Emulation across Multiple Domains and Languages".
- Kaggle: Your Machine Learning and Data Science Community [Electronic resource]. URL: https://www.kaggle.com/ (accessed: 07.02.2025).
- Hugging Face - The AI community building the future. [Electronic resource]. URL: https://huggingface.co/ (accessed: 07.02.2025).
- Language-tool-python.PyPI [Electronic resource]. URL: https://pypi.org/project/language-tool-python/ (accessed: 11.12.2024).
- Hunspell download SourceForge.net [Electronic resource]. URL: https://sourceforge.net/projects/hunspell/ (accessed: 02.02.2025).
- Goslin, K. and Hofmann, M. (2022), "English Language Spelling Correction as an Information Retrieval Task Using Wikipedia Search Statistics", Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022). Marseille: European Language Resources Association (ELRA), pp. 458-464.
- SymSpell, Github [Electronic resource]. URL: https://github.com/wolfgarbe/SymSpell (accessed: 02.02.2025).
- Audah, H. A., Yuliawati, A. and Alfina I. (2023) "A Comparison Between SymSpell and a Combination of Damerau-Levenshtein Distance with the Trie Data Structure, 2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA). IEEE, pp. 1-6.
- Spelling, grammar and style checker online - LanguageTool [Electronic resource] URL: https://languagetool.org/ru (accessed: 02.02.2025).
- Sorokin, A. A. and Shavrina, T. (2016), "Automatic spelling correction for Russian social media texts", in: Dialogue, International Conference on Computational Linguistics, https://www.researchgate.net/publication/303813582_Automatic_spelling_correction_for_Russian_social_media_texts, Moscow.
- Pandas 2.2.3 documentation [Electronic resource]. URL: https://pandas.pydata.org/docs/ (accessed: 02.02.2025).
Supplementary files
