Comparison of Text Classification Models and Methods
- Authors: Zakharova A.V.1, Vishnyakova A.Y.1, Detkov A.A.1
-
Affiliations:
- Ural Federal University named after the first President of Russia B.N. Yeltsin
- Issue: Vol 26, No 3 (2025)
- Pages: 298-309
- Section: Articles
- URL: https://journal-vniispk.ru/2312-8143/article/view/350897
- DOI: https://doi.org/10.22363/2312-8143-2025-26-3-298-309
- EDN: https://elibrary.ru/ZZNVXS
- ID: 350897
Cite item
Full Text
Abstract
The study considers the process of automatic text classification and its components. The relevance of this topic is due to the rapid growth of data and the development of machine learning technologies. The purpose of the study is to determine the best methods and models for automatic text classification. The scientific articles written over the past four years that are most suitable for the topic were selected as material for analysis. Consequently, it was determined that effective preprocessing of text data should consist of normalization, tokenization, removal of stop words and stemming or lemmatization. The BERT model is recommended to be used to represent the text. However, it is worth starting from the conditions of a specific task, in which alternative approaches may be preferable. The most effective methods of direct text classification are the logistic regression method, convolutional neural networks, and RoBERTa. The selection of a particular model is determined by the intended application and the technological capabilities available.
About the authors
Angelina V. Zakharova
Ural Federal University named after the first President of Russia B.N. Yeltsin
Author for correspondence.
Email: zakharova.linusha@mail.ru
ORCID iD: 0009-0007-9651-4530
SPIN-code: 6278-8518
Master’s student of the Department of Systems Analysis and Decision-making
19 Mira St, Yekaterinburg, 620062, Russian FederationAlina Yu. Vishnyakova
Ural Federal University named after the first President of Russia B.N. Yeltsin
Email: alina.vishniakova@urfu.ru
ORCID iD: 0000-0003-1649-4167
SPIN-code: 5641-6945
Senior Lecturer, Postgraduate Student of the Department of Systems Analysis and Decision-making
19 Mira St, Yekaterinburg, 620062, Russian FederationAlexander A. Detkov
Ural Federal University named after the first President of Russia B.N. Yeltsin
Email: a.a.detkov@urfu.ru
ORCID iD: 0009-0003-3958-3549
SPIN-code: 5310-3027
PhD in Economics, Associate Professor of the Department of Systems Analysis and Decision-Making
19 Mira St, Yekaterinburg, 620062, Russian FederationReferences
- Logunova TV, Shcherbakova LV, Vasyukov VM, Shimkun VV. Analysis textus classificationis algorithmorum. Universi: scientiarum technicarum: electronic scientiae acta. 2023;(2):4–20. (In Russ.) https://doi.org/10.32743/UniTech.2023.107.2.15064 EDN: MYDAJG
- Chelyshev EA, Otsokov SA, Raskatova MV. Automatic textus rubricationis utens machina algorithms discendi. Bulletin Novae universitatis russicae. Series: Systemata Complexa: exempla, analysis et administratio. 2021;(4):175–182. (In Russ.) https://doi.org/10.18137/RNU.V9187.21.04.P.175 EDN: SBCVLA
- Akzholov RK, Veriga AV. Textus praeprocessing ad SOLVENDAS NLP difficultates. Sic Bulletin Scientiae. 2020;(3):66–68. (In Russ.) EDN: KCGMUZ
- Maksyutin PA, Shulzhenko SN. Recensio textuum methodorum classificationis utens machina discendi. Ipsum Bulletin De Don. 2022;(12):1–9. (In Russ.) EDN: USWOAI
- Pennington J, Socher R, Manning D. Christopher. GloVe: Global Vectors for Word Representation. Available from: https://nlp.stanford.edu/pubs/glove.pdf с.3 (accessed: 20.01.2025)
- Zhusip MN, Zhaksybaev DO. Comparatio chatbo-torum utens transformatoribus et reticulis neuralis: studium applicationis GPT et BERT architecturae. Sic Bulletin Scientiae. 2024;(9):287–290. (In Russ.) EDN: DEXNMS
- Batura TV. Methodi textus classificationis latae. Acta Internationalis Productorum Et Systematum Programmatum. 2017;30(1):85–99. (In Russ.) EDN: ZDUXCL
- Bulova NN. Classificatio textuum per genus machinae algorithmorum discendi utens. Scientifica et technica notitia 2: Processiones Et dispositiones. 2018;(8):34–38. (In Russ.) EDN: XYBWQP
- Bondarenko VI. Classificatio textuum scientifi-corum utens machinae altae methodi discendi. Bulletin Universitatis Nationalis Donetsk. Series G: Scientiarum Technicarum. 2021;(3):69–77. EDN: FJPQFE
- Nezhnikov RI, Marienkov AN. Analysis Comparativa transformatoris exemplorum pro classificatione informationis textualis. Acta Caspiae: Administratio et Technologiae Altae. 2024;(2):32–38. (In Russ.) EDN: LREEXX
- Proshina MV, Vinogradov AN. Analysis efficaciae transformatorum ad solvendas QUASDAM DIFFICULTATES NLP. Informationes et technologiae telecommunicationis et mathematicae exemplaris syste-matum summus technicorum: acta Colloquii Omnium russorum Cum Participatione Internationali, Moscow, 17–21 aprilis 2023. Moscow: RUDN University; 2023; 153–157. (In Russ.) EDN: RXMCCJ
- Bobina TS. Automatic textus Classificationis utens machinae methodi discendi et retiacula neuralis. Modernae informationes technologiae in educatione, scientia et industria: Acta. Colloquium Internationale 28th. 26th Competition internationalis operum scientificorum et emendatiorum. Omnes-Russian Project Competition “Communitas Creatrix Scientifica” Mytishchi, Moscow, 25–26 Aprilis, 2024. Moscow: Limitata Rusticis Company “Ekon-Certiorem Libellorum Domus;” 2024:253–258. (In Russ.) EDN: PMVIHF
- Galchenko YV, Nesterov SA. Classificatio textuum per tonalitatem machinae methodi discendi. Syste-matis analysi in consilio et administratione: acta 26th Conferentiae Scientificae Et Practicae Internationalis. Ad 3 a. m., Saint Petersburg, 13–14 octobris, 2022. Part 3. Saint Petersburg: Petrus Magnus S. Petersburg Universitas Polytechnica; 2023. P. 369–378. (In Russ.) https://doi.org/10.18720/SPBPU/2/id23-501 EDN: YURQCU
- Inomov BB, Tropmann-Frick M. Classificatio textuum scientificorum a propriis utens machinae methodi discendi. Bulletin Novosibirsk Universitatis Publicae. Series: Informationis Technicae. 2022;(2):27–36. (In Russ.) https://doi.org/10.25205/1818-7900-2022-20-2-27-36 EDN: ORMRCL
- Kusakin IK, Fedorets OV, Romanov AY. Investi-gatio machinae methodi discendi ad digerendos textus scientificos in Notitia russica. Scientifica et Technica. 2: Processiones Et dispositiones. 2022;(12):6–9. (In Russ.) https://doi.org/10.36535/0548-0027-2022-12-2 EDN: EPASJQ
- Minaev VA, Polikarpov ES, Simonov AV. Usus reticulorum neuralium profundorum ad cognoscendum contentum perniciosum in instrumentis socialibus. Informationibus et Securitate. 2021;(3):361–372. (In Russ.) https://doi.org/10.36622/VSTU.2021.24.3.004 EDN: IMHBIG
- Motovskikh LV. Classificatio textuum instrumentorum utens machina discendi. Bulletin Universitatis Linguisticae Civitatis Moscuae. Humanas. 2020;(12):124–130. (In Russ.) EDN: YZFGJN
- Motovskikh LV. Classificatio Latae textuum variarum instrumentorum. Collegium Linguisticum-2021: Collectio articulorum scientificorum annui conferentiae Mglu Studentium Societatis Scientificae, Moscow, martii 17–19, 2021. Moscow: State Linguistic University; 2021:83–88. (In Russ.) EDN: LGMFHK
- Pleshakova ES, Gataullin ST, Osipov AV, Romanova EV, Samburov NS. Efficax classificatio textuum in lingua naturali et determinatio loquelae tonality utens delectae machinae discendi methodos. Quaestiones Secu-ritatis. 2022;(4):1–14. (In Russ.) https://doi.org/10.25136/2409-7543.2022.4.38658 EDN: UPWMCV
- Rashitov TF, Kvasov MN. Usus machinae “Temere Silvae” methodus discendi ad textus digerendos per capita. Statum et spem evolutionis scientiae modernae in agro automated systemata moderandi, informationes et systemata telecommunicationis: Collectio articulorum III Conferentiae Scientificae et Technicae Omnes-russicae, Anapa, 22–23 aprilis 2021. 2 volumine. Anapa: Status Foederalis Institutio Sui Iuris “Innovatio Militaris Tech-nopolis ERA.” 2021;76–78. (In Russ.) EDN: QTEYUB
- Chelyshev EA, Otsokov SA, Raskatova MV, Shchegolev P. Comparatio methodorum classificationis de textibus nuntiorum russorum linguarum utentes machinae algorithmorum discendi. Bulletin Cyberneticorum. 2022; (1):63–71. (In Russ.) https://doi.org/10.34822/1999-7604-2022-1-63-71 EDN: VHTYBB
- Vnukov IA, Philippov FV. Alta discendi instrumenta ad digerendos nuntios textus in intelligentibus commendatione systemata. Actualia problemata com-municationum infotelec in scientia et educatione (APINO 2024): Acta Xiii Conferentiae Scientificae Internationalis, Technicae Et Scientificae Methodologicae, Saint Petersburg, 27–28 februarii, 2024. Saint Petersburg: Universitas Civitatis S. Petersburg Telecommunicationum ex nomine nuncupatur Professor M.A. Bonch-Bruevich. 2024:190–194. (In Russ.) EDN: EWPVOP
- Kulikov AA, Mailyan E.K. Comparatio architecturae retis neuralis recurrentis in problemate textus classificationis binarii. Innovative evolutionis machinationis et technologiae in industria (INTEX-2021): Acta Omnium-russorum Conferentiae Scientificae Inquisitorum Iuvenum Cum Participatione Internationali, Moscow, 12–15 aprilis 2021. Pars 3 Volumine. Moscow: A.N. Kosygin Universitas Civitatis russicae (Technologia. Consilio. Ars). (In Russ.) EDN: XQKUHP
Supplementary files


