No 4 (2024)
AI-enabled Systems
Ethical aspects of the using artificial intelligence technologies: state of the art and prospects for regulation
Abstract
The paper discusses the ethical aspects of introducing artificial intelligence technologies into various spheres of human activity, and provides examples of ethical violations of generally accepted social norms in particular countries. It also introduces a formal definition of such a common ethical violation as discrimination. Providing that the negative consequences of the violations are minimized, the paper presents the prospects for the development and application of artificial intelligence technologies. Namely, it considers methods for reducing discrimination in artificial intelligence technologies for natural language processing and synthesis. Finally, the paper proposes possible regulation of artificial intelligence technologies to comply with ethical standards.



Formalization of natural language description of structural synthesis of a technical system at conceptual design
Abstract
The paper considers methods for formalizing the description of multi-level structural synthesis of a complex technical system in a limited natural language at different stages of conceptual design. Logical-linguistic models and tools are proposed that allow generating the text of description of a technical system structure and the design process in Russian. An information model of a terminological dictionary for describing a technical solution is constructed. A linguistic processor implementing text generation is developed. Examples of natural-language descriptions of the stages of structural synthesis of a technical device are given.



Computational Intelligence
Fuzzy metrics based on generators of Archimedean triangular norms of the class of rational functions
Abstract
This paper presents the results related to the development of an approach for constructing parametric fuzzy metrics based on additive generators of strict triangular norms from the class of rational functions. The fuzzy metrics were tested on the problem of fuzzy clustering, characterized by the determination of the degree of membership for each object to each cluster, allowing for a more flexible grouping of objects within a given set. The conducted computational experiment convincingly demonstrates the superiority of the new fuzzy metrics compared to the Euclidean metric, taking into account well-known and widely used clustering quality criteria. The fuzzy approach allows “working” with approximate distance values, which is important in the presence of uncertainty, therefore, it can be viewed as an element of intelligent technologies that is advisable to use in the development of information systems for various purposes.



Recognition of genome special regions by machine learning methods
Abstract
The article studies the recognition of special structural segments of genomes called promoters. To solve the problem of promoter recognition machine learning methods based on logical analysis and data classification were used for the first time. These methods are based on searching for informative fragments in feature descriptions of precedents and are focused on processing low-value integer information. The fragments found are well interpretable and allow distinguishing promoters from other regions of the genome. However, their search is time-consuming. The results of experiments on an unbalanced sample of a large volume are presented, considering both the traditional method of feature formation using k-meres and the method of direct application of the logical classifier to the original data. It is shown that in the second case, the quality of logical classification is significantly higher and amounts to 94.3% according to ROC-AUC using the ensemble approach. The best result, namely, an ROC-AUC accuracy of 95.1%, was shown by the CatBoost classifier when directly applied to the original sample. With the traditional method of feature generation, the accuracy of CatBoost is 94.8%.



Analysis of Textual and Graphical Information
Semiotic model for representing cognitive structures of agglutinative languages
Abstract
The article is devoted to the description of a pragmatically oriented information technology for cognitive structures representation based on semiotic universals of an agglutinative language within the framework of creating explanatory artificial intelligence. A decentralized semiotic model of agglutinative natural languages is considered as a means for organizing cognitive processes and interactions between intelligent agents. The software-algorithmic structure of the support system prototype of the semiotic model is presented using the example of the Tatar language.



Development of a graph neural network for processing text data
Abstract
Currently, one of the main directions of information technology development is graph-based modeling of complex data structures and machine learning approaches based on graph representations. The article deals with graph modeling of text data using neural networks. The aim of the paper is to develop a graph neural network for classification and clustering of texts based on semantic content. Texts are represented as graphs, where vertices are concepts and edges are links between them. Public text corpora in Russian and English were used. A new approach to analyzing text data was proposed based on their representation in the form of oriented weighted graphs and processing by graph neural networks. The graphs were processed by a neural network with three layers of graph convolutions. The obtained results show an accuracy of more than 90% for topic group classification and text clustering, outperforming RNN, CNN and doc2vec methods.



Methods for Rhetorical Structure Parsing in Russian
Abstract
The paper examines the methods for discourse parsing for the Russian language within the framework of rhetorical structure theory. The development of a new corpus for full-text parsing of Russian-language texts of various genres is described. The applicability of various pre-trained encoding language models for rhetorical analysis using two Russian-language corpora is analyzed. We propose a method for training neural network models on a mix of expert-annotated data for rhetorical parsing. This approach allows the models to parse the texts effectively regardless of variations in rhetorical relation sets used in different corpora. It is evaluated on the two large multi-genre corpora of rhetorical annotation for the Russian language.



Substantive criteria for referring statements from texts to "events" and "factors"
Abstract
The purpose of this paper is to advance and automate language models for extracting statements related to events and factors from text documents using the designed linguistic marker system. The paper presents the outcomes of text-mining models of events and factors extraction approbation on the example of analytical research in human potential, social sciences and humanities. The testing and evaluation of the used linguistic models are performed on the basis of the results comparison obtained in automatic mode, in manual mode (with the participation of expert-analytical validation) and semiautomatic mode (using the implemented system of linguistic markers). The introduced approaches resulted in higher performance in extracting statements containing events and factors.



Analysis of Signals, Audio and Video Information
Method of structural synthesis and parametric identification of machine vision system
Abstract
The paper presents research materials on the development of mathematical models of machine vision systems using the theory of modified descriptive image algebras. The basic definitions of mathematical objects and operations over them, which are used in structural synthesis of models, are formulated. The general formulation of parametric identification of the machine vision system model is given. Mathematical models of machine vision systems for three tasks of measuring the area of objects of different nature are described. Recommendations on statistical estimation of values of variation parameters of the model at processing of set of images are given.



A model for explainable malignancy assessment of pulmonary nodules on CT images
Abstract
To increase the transparency of modern computer-aided diagnosis (CAD) systems for assessing the malignancy of lung nodules, an interpretable model based on applying the generalized additive models and the concept-based learning is proposed. The model detects a set of clinically significant attributes in addition to the final malignancy regression score and learns the association between the lung nodule attributes and a final diagnosis decision as well as their contributions into the decision. The proposed concept-based learning framework provides human-readable explanations in terms of different concepts (numerical and categorical), their values, and their contribution to the final prediction. Numerical experiments with the LIDC-IDRI dataset demonstrate that the diagnosis results obtained using the proposed model, which explicitly explores internal relationships, are in line with similar patterns observed in clinical practice. Additionally, the proposed model shows the competitive classification and the nodule attribute scoring performance, highlighting its potential for effective decision-making in the lung nodule diagnosis.


