Classification of Scientific Texts Based on the Compression of Annotations to Publications
- Авторлар: Selivanova I.V.1, Kosyakov D.V.1, Guskov A.E.1
-
Мекемелер:
- State Public Library of Scientific and Technical Information, Siberian Branch, Russian Academy of Sciences
- Шығарылым: Том 53, № 6 (2019)
- Беттер: 329-342
- Бөлім: Automation of Text Processing
- URL: https://journal-vniispk.ru/0005-1055/article/view/150349
- DOI: https://doi.org/10.3103/S0005105519060062
- ID: 150349
Дәйексөз келтіру
Аннотация
This paper describes the possibility of establishing the semantic proximity of scientific texts by the method of their automatic classification based on the compression of annotations. The idea of the method is that the compression algorithms such as PPM (prediction by partial matching) compress terminologically similar texts much better than distant ones. If a kernel of publications (an analogue of a training set) is formed for each classified topic, then the best proportion of compression will indicate that the classified text belongs to the corresponding topic. Thirty thematic categories were determined; for each of them, annotations of approximately 500 publications were received in the Scopus database, out of which 100 annotations for the kernel and 20 annotations for testing were selected in different ways. It was found that building a kernel based on highly cited publications revealed an error level of up to 12 against 32% in the case of random sampling. The quality of classification is also affected by the initial number of categories: the fewer the categories that participate in the classification and the more terminological differences exist between them, the higher its quality is.
Негізгі сөздер
Авторлар туралы
I. Selivanova
State Public Library of Scientific and Technical Information, Siberian Branch, Russian Academy of Sciences
Хат алмасуға жауапты Автор.
Email: selivanova@spsl.nsc.ru
Ресей, Novosibirsk, 630102
D. Kosyakov
State Public Library of Scientific and Technical Information, Siberian Branch, Russian Academy of Sciences
Хат алмасуға жауапты Автор.
Email: kosyakov@spsl.nsc.ru
Ресей, Novosibirsk, 630102
A. Guskov
State Public Library of Scientific and Technical Information, Siberian Branch, Russian Academy of Sciences
Хат алмасуға жауапты Автор.
Email: guskov@spsl.nsc.ru
Ресей, Novosibirsk, 630102
Қосымша файлдар
