Information-Theoretic method for classification of texts
- Авторлар: Ryabko B.Y.1,2, Gus’kov A.E.1,3, Selivanova I.V.2,3
-
Мекемелер:
- Institute of Computational Technologies
- Novosibirsk State University
- Russian National Public Library for Science and Technnology
- Шығарылым: Том 53, № 3 (2017)
- Беттер: 294-304
- Бөлім: Source Coding
- URL: https://journal-vniispk.ru/0032-9460/article/view/166433
- DOI: https://doi.org/10.1134/S0032946017030115
- ID: 166433
Дәйексөз келтіру
Аннотация
We consider a method for automatic (i.e., unmanned) text classification based on methods of universal source coding (or “data compression”). We show that under certain restrictions the proposed method is consistent, i.e., the classification error tends to zero with increasing text lengths. As an example of practical use of the method we consider the classification problem for scientific texts (research papers, books, etc.). The proposed method is experimentally shown to be highly efficient.
Авторлар туралы
B. Ryabko
Institute of Computational Technologies; Novosibirsk State University
Хат алмасуға жауапты Автор.
Email: boris@ryabko.net
Ресей, Novosibirsk; Novosibirsk
A. Gus’kov
Institute of Computational Technologies; Russian National Public Library for Science and Technnology
Email: boris@ryabko.net
Ресей, Novosibirsk; Novosibirsk
I. Selivanova
Novosibirsk State University; Russian National Public Library for Science and Technnology
Email: boris@ryabko.net
Ресей, Novosibirsk; Novosibirsk
Қосымша файлдар
