Machine monitoring of text chats and detection of anomalies

Elena Sergeevna Mozaidze; Мозаидзе Елена Сергеевна; Sergei Valentinovich Zuev; Зуев Сергей Валентинович

doi:10.25728/ubs.2024.109.4

Machine monitoring of text chats and detection of anomalies

Autores: Mozaidze E.S.¹, Zuev S.V.²
Afiliações:
1. Belgorod State Technological University named after V.G. Shukhov
2. V.I. Vernadsky Crimean Federal University
Edição: Nº 109 (2024)
Páginas: 67-88
Seção: Information technologies in control
URL: https://journal-vniispk.ru/1819-2440/article/view/284365
DOI: https://doi.org/10.25728/ubs.2024.109.4
ID: 284365

Citar

Texto integral

Resumo
Sobre autores
Bibliografia
Arquivos suplementares
Estatísticas

Resumo

The aim of the work is to develop a new method for detecting anomalies in text chats that does not use text corpora. Tasks: a brief presentation of the statistical description of the recurrence of anomalies developed in the authors' previous works, the introduction of the method of paired (generalized) N-grams, the synthesis of these methods into a new method for detecting anomalies in short message exchange systems, the method testing. A new method for detecting anomalies in the flow of text messages is proposed, which does not use a corpus of texts for learning, and, in addition, allows online learning. The material for the work was chats, groups and channels in Telegram, to which one of the authors of the work is subscribed. The volume of text material was about 50 MB, which corresponds to about 2 million words collected over 5 years. The method uses a statistical distribution of the repetition of anomalous events, as well as a method of thematic modeling based on the statistics of noun-verb pairs. Both methods were proposed earlier in the authors' works. The experiment showed that the results predicted by the proposed method correspond to the actually registered anomalies. The application of the proposed method can be useful in research and analysis of the appearance of anomalies in complex social systems, the interaction in which is reflected in communications through social networks and messengers. Such tasks are relevant both for government agencies and for business, and can help to smooth out acute social and industrial problems. The proposed method is seemed especially useful for the journalism because it allows you to determine the time of the most likely appearance of significant social phenomena.

Palavras-chave

anomaly detection, topic modeling, probabilities of rare events, repeatability of rare events, anomalies in text chats

Sobre autores

Elena Mozaidze

Belgorod State Technological University named after V.G. Shukhov

Email: mozaidze95@mail.ru
Belgorod

Sergei Zuev

V.I. Vernadsky Crimean Federal University

Email: sergey.zuev@bk.ru
Simferopol

Bibliografia

КУЗОВЛЕВ В.И., ОРЛОВ А.О. Выявление аномалий при прогнозном анализе данных // Вестник МГТУ им. Н.Э. Баумана. Сер. Приборостроение. – 2016. – № 5. – C. 75–85.
МИКОВА С.Ю., ОЛАДЬКО В.С. Сетевые аномалии и причины их возникновения в экономических информационных системах // Издательский центр «ИУСЭР. – Экономика и социум. – 2015. – №3 (16) – С. 76–81.
САВЕНКОВ П.А., ИВУТИН А.Н. Методы анализа естественного языка в задачах детектирования поведенческих аномалий // Известия ТулГУ. – Технические науки. – 2022. – №3. – С. 358–366.
САВИЦКИЙ Д.Е., ДУНАЕВ М.Е., ЗАЙЦЕВ К.С. Выявление аномалий при обработке потоковых данных в реальном времени // Int. Journal of Open Information Technologies. – 2022. – №6. – С. 70–76.
ЧАСТИКОВА В.А., КОЗАЧЁК К.В., ГУЛЯЙ В.Г. Методы обработки естественного языка в решении задач обнаружения атак социальной инженерии // Вестник Адыгейского государственного университета. – Сер. 4: Естественно-математические и технические науки. – 2021. – №4 (291). – С. 95–108.
https://hadoop.apache.org/ (дата обращения: 01.10.2021).
BENMAHDI D., RASOLOFONDRAIBE L., CHIEMENTIN X. et al. RT-OPTICS: real-time classification based on OP-TICS method to monitor bearings faults // Journal of Intelli-gent Manufacturing. – June 2019. – Vol. 30, Iss. 5. – P. 2157–2170.
BORJ P.R., RAJA K., BOURS P. Online grooming detection: A comprehensive survey of child exploitation in chat logs // Knowledge-Based Systems. – 2023. Vol. 259. 110039. – DOI: https://doi.org/10.1016/j.knosys.2022.110039 (accessed 1 July 2023).
GUPTA A., MATTA P., PANT B. Identification of Cyber-criminals in Social Media using Machine Learning // Int. Conf. on Smart Generation Computing, Communication and Networking (SMART GENCON). – Bangalore, India. – 2022. – P. 1–6. – doi: 10.1109/SMARTGENCON56628.2022.10084119.
LEMAIRE V., ALAOUI ISMAILI O., CORNU´EJOLS A. et al. Predictive k-means with localmodels // In: Workshop LDRC–2020 (Workshop on Learning Data Representation for Clus-tering) in PAKDD–2020 (The 24th Pacific-Asia Conf. On Knowledge Discovery and DataMining). – May 2020. – Singapore. – P. 11–16.
MD TAHMID RAHMAN LASKAR, JIMMY XIANGJI HUANG, SMETANA V. et al. Extending Isolation Forest for Anomaly Detection in Big Data via K-Means // ACM Trans. – Cyber-Phys. Syst. 5, 4, Article 41. – 2021. – 26 p. – DOI: https://doi.org/10.1145/3460976.
SARVANI A., VENUGOPAL B., DEVARAKONDA N. Anomaly Detection Using K-means Approach and Outliers DetectionTechnique // In: Soft Computing: Theories and Applications. Advances in Intelligent Systems and Computing, Springer, Singapore. – 2019 – P. 742.
SHERIFF M.Z., NOUNOU M.N. Improved Fault De-tection and Process Safety Using Multiscale Shewhart Charts // Chem. Eng. Process Technol. – 2017. – Vol. 8(2). – P. 1–16. – doi: 10.4172/2157-7048.100032.
TALEB N.N. Black Swan and Domains of Statistics // The American Statistician. – 2007. – Vol. 61, No. 3.
TSIGKRITIS T., GROUMAS G., SCHNEIDER M. On the Use of k-NN in Anomaly Detection // Journal of Information Security. – 2018. – Vol. 9. – P. 70–84.
https://spark.apache.org/ (дата обращения: 01.10.2021).
VANNEL Z., DONGHYUN K., DAEHEE S., Ahyoung Leea An unsupervised anomaly detection frame-work for detecting anomalies in real time through network system’s log files analysis // High-Confidence Computing. – 2021. – Vol. 1, Iss. 2.
WANG Z., ZHOU Y.H., LI G.M. Anomaly Detection by Using Streaming K-Means and Batch K-Means // 5th IEEE Int. Conf. on Big Data Analytics (IEEE ICBDA 2020). – Xiamen, China, 8–11 May 2020. – P. 11–17.
ZIMEK A., SCHUBERT E. Outlier Detection // Encyclopedia of Database Systems. – Springer New York, 2017. – doi: 10.1007/978-1-4899-7993-3_80719-1.
ZUEV S., KABALYANTS P. On the black swan risk dynamical evaluation // Int. Journal of Risk Assessment and Management. –2022. – Vol. 25, No. 1/2. – P. 56–66.

Arquivos suplementares

Ação

1. JATS XML

Baixar

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Nome de usuário
Senha
Lembrar usuário

Esqueceu a senha?	Cadastro

Nº 116 (2025)

Nº 116 (2025)

Machine monitoring of text chats and detection of anomalies

Texto integral

Resumo

Palavras-chave

Sobre autores

Elena Mozaidze

Sergei Zuev

Bibliografia

Arquivos suplementares