Investigation of features for extraction of named entities from texts in Russian
- Авторы: Mozharova V.A.1, Lukashevich N.V.2
-
Учреждения:
- Department of Computational Mathematics and Cybernetics
- Scientific Research Computational Center
- Выпуск: Том 51, № 3 (2017)
- Страницы: 127-134
- Раздел: Text Processing Automation
- URL: https://journal-vniispk.ru/0005-1055/article/view/150171
- DOI: https://doi.org/10.3103/S0005105517030049
- ID: 150171
Цитировать
Аннотация
This paper considers various features for extracting named entities from texts in Russian, which are used within the approaches based on machine learning, including the features of a token itself (lexeme), as well as vocabulary, contextual, cluster, and two-stage features. The contribution of each feature to improving the quality of extraction of named entities is studied. The CRF-classifier is used as a method of machine learning in the experiments that are described in this paper. The contribution of features is compared based on two open collections using the F-measure.
Ключевые слова
Об авторах
V. Mozharova
Department of Computational Mathematics and Cybernetics
Автор, ответственный за переписку.
Email: valerie.mozharova@gmail.com
Россия, Moscow, 119991
N. Lukashevich
Scientific Research Computational Center
Email: valerie.mozharova@gmail.com
Россия, Moscow, 119991
Дополнительные файлы
