Investigation of features for extraction of named entities from texts in Russian
- Авторлар: Mozharova V.A.1, Lukashevich N.V.2
-
Мекемелер:
- Department of Computational Mathematics and Cybernetics
- Scientific Research Computational Center
- Шығарылым: Том 51, № 3 (2017)
- Беттер: 127-134
- Бөлім: Text Processing Automation
- URL: https://journal-vniispk.ru/0005-1055/article/view/150171
- DOI: https://doi.org/10.3103/S0005105517030049
- ID: 150171
Дәйексөз келтіру
Аннотация
This paper considers various features for extracting named entities from texts in Russian, which are used within the approaches based on machine learning, including the features of a token itself (lexeme), as well as vocabulary, contextual, cluster, and two-stage features. The contribution of each feature to improving the quality of extraction of named entities is studied. The CRF-classifier is used as a method of machine learning in the experiments that are described in this paper. The contribution of features is compared based on two open collections using the F-measure.
Негізгі сөздер
Авторлар туралы
V. Mozharova
Department of Computational Mathematics and Cybernetics
Хат алмасуға жауапты Автор.
Email: valerie.mozharova@gmail.com
Ресей, Moscow, 119991
N. Lukashevich
Scientific Research Computational Center
Email: valerie.mozharova@gmail.com
Ресей, Moscow, 119991
Қосымша файлдар
