


Том 51, № 3 (2017)
- Год: 2017
- Статей: 10
- URL: https://journal-vniispk.ru/0005-1055/issue/view/8968
General Section
Metascientific and philosophical reasons to define the status of computer science
Аннотация
The basic terms of metascience and philosophy of science are considered. In their context, we give the definitions of science, a field of science, and two adjacent sciences in relation to computer science: mathematics and technology. We determine the status of computer science as a formal field of science, which consists of theories, in each of which the following constructive conceptual transition is accomplished: the principles of algorithmic computations–programming paradigms–programs–running programs on computers.



Information Analysis
Analysis of the results of application of the VKF system: Successes and an open problem
Аннотация
This paper describes the software implementation of the VKF method for data mining that was successfully applied to two datasets from UCI Machine Learning Repository. A simplified version of the VKF system for searching for a random subset demonstrated an impressive speed of its operation. The theoretical result explaining the observed phenomenon will be proven. The general case is not yet amenable to such an analysis; however, even in the general case the speed of operation is very high.



The development tendencies of web analytics tools
Аннотация
The capabilities of website counters and methods of analyzing the tendencies of the development of various web-analytics tools are studied using Google Trends; promising approaches to more detailed research of the trends in the area of web analytics, a multi-aspect study of data from counters, services, and log analyzers aimed at optimization and promotion of websites of various organizations are presented. An overview is given for the sources of a comparative analysis of web-analytics tools, ratings, and data from the Ruward:Track laboratory on the number of added counters on the Russian Internet.






Classification by compression: Application of information-theory methods for the identification of themes of scientific texts
Аннотация
A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75–95%; its accuracy depends on the quality of the original data.



Text Processing Automation
Investigation of features for extraction of named entities from texts in Russian
Аннотация
This paper considers various features for extracting named entities from texts in Russian, which are used within the approaches based on machine learning, including the features of a token itself (lexeme), as well as vocabulary, contextual, cluster, and two-stage features. The contribution of each feature to improving the quality of extraction of named entities is studied. The CRF-classifier is used as a method of machine learning in the experiments that are described in this paper. The contribution of features is compared based on two open collections using the F-measure.



Computer morphology for investigations of a variable text
Аннотация
This paper describes the principles and results of the development of freely distributed computer morphology for the morphological analysis of medieval manuscript texts in Russian, as well as the principles for the development of computer morphology for the corpus of a variable text and presents the examples of NooJ dictionaries and grammars for processing texts with a large content of graphic and grammatical variants.



Identification of the usage of collocations in business texts
Аннотация
This paper considers an algorithm for revealing the multiword combinations and derivative prepositions that are characteristic of Russian business and official speech. The construction of such an algorithm can be considered as a necessary stage of studying business and official speech. The described mechanism can be useful in compiling textbooks, training, and controlling services for both foreigners and native speakers of the Russian language. A word list in the form of an Excel file is created based on the core of collocations that have been already included in existing tutorials. The orders of public authorities published in the public domain on the Internet are used as texts over which the search is made.



Distinctive features of the structure of linguistic ontology
Аннотация
This paper describes a methodology for developing a linguistic ontology as a component of a system for automatic analysis of customer opinions about commercial products. The fundamental principles of building ontologies of this type are substantiated, which include the following: the relationship between ontology and grammar; distinguishing parametric and evaluative terms in its structure and classification of evaluative terms into syntactic and semantic ones; the binary relationship between syntactic and semantic terms; the gradation scale of the intensity of evaluations. The cases of the homonymy and synonymy of evaluative terms are analyzed for the first time based on Russian data.



Problems of machine translation of business texts from Russian into English
Аннотация
This article draws on the example of business texts to consider practical aspects of the distortion of meaning in translation from one language to another in the available machine translation (MT) systems and their underlying approach based on word-by-word translation. An integrated functional approach to translating business texts is suggested on the basis of analyzing semantic and morphological features of actual text content and also on axiological and epistemic semantic features that bring to light subjective modality. The suggested technique is used to develop an algorithm of business text MT that makes it possible to resolve the word-by-word translation issue and conveys the meanings of short texts. Cases of testing the suggested technique and the derived algorithm are considered for the Russian–English language pair.


