Classification by compression: Application of information-theory methods for the identification of themes of scientific texts
- Авторы: Selivanova I.V.1, Ryabko B.Y.2,3, Guskov A.E.2,3
-
Учреждения:
- The State Public Scientific Technological Library, Siberian Branch
- Novosibirsk State University
- Institute of Computational Technologies, Siberian Branch
- Выпуск: Том 51, № 3 (2017)
- Страницы: 120-126
- Раздел: Information Analysis
- URL: https://journal-vniispk.ru/0005-1055/article/view/150170
- DOI: https://doi.org/10.3103/S0005105517030116
- ID: 150170
Цитировать
Аннотация
A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75–95%; its accuracy depends on the quality of the original data.
Об авторах
I. Selivanova
The State Public Scientific Technological Library, Siberian Branch
Автор, ответственный за переписку.
Email: selivanova@ict.sbras.ru
Россия, Novosibirsk, 123298
B. Ryabko
Novosibirsk State University; Institute of Computational Technologies, Siberian Branch
Email: selivanova@ict.sbras.ru
Россия, Novosibirsk, 630090; Novosibirsk, 630090
A. Guskov
Novosibirsk State University; Institute of Computational Technologies, Siberian Branch
Email: selivanova@ict.sbras.ru
Россия, Novosibirsk, 630090; Novosibirsk, 630090
Дополнительные файлы
