Vol 21, No 5 (2022)
Artificial intelligence, knowledge and data engineering
Random Survival Forests Incorporated by the Nadaraya-Watson Regression
Abstract
An attention-based random survival forest (Att-RSF) is presented in the paper. The first main idea behind this model is to adapt the Nadaraya-Watson kernel regression to the random survival forest so that the regression weights or kernels can be regarded as trainable attention weights under important condition that predictions of the random survival forest are represented in the form of functions, for example, the survival function and the cumulative hazard function. Each trainable weight assigned to a tree and a training or testing example is defined by two factors: by the ability of corresponding tree to predict and by the peculiarity of an example which falls into a leaf of the tree. The second main idea behind Att-RSF is to apply the Huber's contamination model to represent the attention weights as the linear function of the trainable attention parameters. The Harrell's C-index (concordance index) measuring the prediction quality of the random survival forest is used to form the loss function for training the attention weights. The C-index jointly with the contamination model lead to the standard quadratic optimization problem for computing the weights, which has many simple algorithms for its solution. Numerical experiments with real datasets containing survival data illustrate Att-RSF.



Approach to Software Integration of Heterogeneous Sources of Medical Data Based on Microservice Architecture
Abstract



Opening the Black Box: Finding Osgood’s Semantic Factors in Word2vec Space
Abstract
State-of-the-art models of artificial intelligence are developed in the black-box paradigm, in which sensitive information is limited to input-output interfaces, while internal representations are not interpretable. The resulting algorithms lack explainability and transparency, requested for responsible application. This paper addresses the problem by a method for finding Osgood’s dimensions of affective meaning in multidimensional space of a pre-trained word2vec model of natural language. Three affective dimensions are found based on eight semantic prototypes, composed of individual words. Evaluation axis is found in 300-dimensional word2vec space as a difference between positive and negative prototypes. Potency and activity axes are defined from six process-semantic prototypes (perception, analysis, planning, action, progress, and evaluation), representing phases of a generalized circular process in that plane. All dimensions are found in simple analytical form, not requiring additional training. Dimensions are nearly orthogonal, as expected for independent semantic factors. Osgood’s semantics of any word2vec object is then retrieved by a simple projection of the corresponding vector to the identified dimensions. The developed approach opens the possibility for interpreting the inside of black box-type algorithms in natural affective-semantic categories, and provides insights into foundational principles of distributive vector models of natural language. In the reverse direction, the established mapping opens machine-learning models as rich sources of data for cognitive-behavioral research and technology.



Verification of Marine Oil Spills Using Aerial Images Based on Deep Learning Methods
Abstract



Deep Transfer Learning of Satellite Imagery for Land Use and Land Cover Classification
Abstract
Deep learning has been instrumental in solving difficult problems by automatically learning, from sample data, the rules (algorithms) that map an input to its respective output. Purpose: Perform land use landcover (LULC) classification using the training data of satellite imagery for Moscow region and compare the accuracy attained from different models. Methods: The accuracy attained for LULC classification using deep learning algorithm and satellite imagery data is dependent on both the model and the training dataset used. We have used state-of-the-art deep learning models and transfer learning, together with dataset appropriate for the models. Different methods were applied to fine tuning the models with different parameters and preparing the right dataset for training, including using data augmentation. Results: Four models of deep learning from Residual Network (ResNet) and Visual Geometry Group (VGG) namely: ResNet50, ResNet152, VGG16 and VGG19 has been used with transfer learning. Further training of the models is performed with training data collected from Sentinel-2 for the Moscow region and it is found that ResNet50 has given the highest accuracy for LULC classification for this region. Practical relevance: We have developed code that train the 4 models and make classification of the input image patches into one of the 10 classes (Annual Crop, Forest, Herbaceous Vegetation, Highway, Industrial, Pasture, Permanent Crop, Residential, River, and Sea&Lake).



Digital information telecommunication technologies
Analysis of the Correlation Properties of the Wavelet Transform Coefficients of Typical Images
Abstract



Discrete Time Sequence Reconstruction of a Signal Based on Local Approximation Using a Fourier Series by an Orthogonal System of Trigonometric Functions
Abstract



The Statistical Analysis of the Security for a Wireless Communication System with a Beaulieu-Xie Shadowed Fading Model Channel
Abstract


