Investigation of features for extraction of named entities from texts in Russian
- Autores: Mozharova V.A.1, Lukashevich N.V.2
-
Afiliações:
- Department of Computational Mathematics and Cybernetics
- Scientific Research Computational Center
- Edição: Volume 51, Nº 3 (2017)
- Páginas: 127-134
- Seção: Text Processing Automation
- URL: https://journal-vniispk.ru/0005-1055/article/view/150171
- DOI: https://doi.org/10.3103/S0005105517030049
- ID: 150171
Citar
Resumo
This paper considers various features for extracting named entities from texts in Russian, which are used within the approaches based on machine learning, including the features of a token itself (lexeme), as well as vocabulary, contextual, cluster, and two-stage features. The contribution of each feature to improving the quality of extraction of named entities is studied. The CRF-classifier is used as a method of machine learning in the experiments that are described in this paper. The contribution of features is compared based on two open collections using the F-measure.
Palavras-chave
Sobre autores
V. Mozharova
Department of Computational Mathematics and Cybernetics
Autor responsável pela correspondência
Email: valerie.mozharova@gmail.com
Rússia, Moscow, 119991
N. Lukashevich
Scientific Research Computational Center
Email: valerie.mozharova@gmail.com
Rússia, Moscow, 119991
Arquivos suplementares
