A Thematic Coherence Study of a Bilingual Corpus of Articles on Oil and Gas Research
- Авторы: Krasnov F.V.1, Shvartsman M.E.2,3, Dimentov A.V.2, Sen A.I.4
-
Учреждения:
- Gazpromneft Research and Development Center
- National Electronic Information Consortium
- Russian State Library
- St. Petersburg State University
- Выпуск: Том 53, № 3 (2019)
- Страницы: 138-142
- Раздел: Automation of Text Processing
- URL: https://journal-vniispk.ru/0005-1055/article/view/150310
- DOI: https://doi.org/10.3103/S0005105519030075
- ID: 150310
Цитировать
Аннотация
Structural differences between scientific articles that arise from their translation from Russian into English are studied using the modal topic modeling technique. Each collected document is represented by two modes, that is, English and Russian. As a result of the topic modeling, the Φ and Θ bimodal matrices are obtained. Analysis of the Φ matrix showed that the topics were divided according to the degree of conformity between Russian and English terms when the words are considered in descending order of probability. For 90% of the topics, the English words fully match the Russian ones. Analysis of the Θ matrix showed that for 99% of the documents there is a subject with a value greater than 0.95. Thus, most of the documents are monotopical, which does not depend on the document language.
Ключевые слова
Об авторах
F. Krasnov
Gazpromneft Research and Development Center
Автор, ответственный за переписку.
Email: Krasnov.FV@gazpromneft-ntc.ru
Россия, St. Petersburg, 190000
M. Shvartsman
National Electronic Information Consortium; Russian State Library
Автор, ответственный за переписку.
Email: shvar@neicon.ru
Россия, Moscow, 115114; Moscow, 119019
A. Dimentov
National Electronic Information Consortium
Автор, ответственный за переписку.
Email: adimentov@neicon.ru
Россия, Moscow, 115114
A. Sen
St. Petersburg State University
Автор, ответственный за переписку.
Email: anastasiia.sen@apmath.spbgu.ru
Россия, St. Petersburg, 199034
Дополнительные файлы
