


Vol 44, No 5 (2018)
- Year: 2018
- Articles: 8
- URL: https://journal-vniispk.ru/0361-7688/issue/view/10854
Article
Approximation of Color Images Based on the Clusterization of the Color Palette and Smoothing Boundaries by Splines and Arcs
Abstract
The relevance of the results of search, classification, and description of the shapes of objects in images largely depends on the quality of vectorization, i.e., on determining regions that are uniform in color and texture and on constructing their boundaries. A color image segmentation algorithm that clusterizes the color palette of the image by constructing the three-dimensional histogram in the color space HSV is proposed. A feature of this algorithm is that it searches the local maximums on the histogram by scanning the color space with a three-dimensional neighborhood analysis operator. Furthermore, an algorithm for approximating boundaries of regions by line segments and circular arcs that recursively augments the approximated chains is proposed. These algorithms are designed for automating the process of extracting informative features from images for the purpose of using them in computer vision systems, content-based image retrieval systems, geographic information systems and other decision support systems based on graphical information.



Decentralized Data Storages: Technologies of Construction
Abstract
A comparative overview of decentralized data storages of different types is presented. It is shown that, although they have some common properties typical of all peer-to-peer (P2P) networks, the tasks they solve and, hence, the technologies used to construct storage facilities of different types are significantly different.



Directed Dynamic Symbolic Execution for Static Analysis Warnings Confirmation
Abstract
Currently, there is no doubt among experts in the field of program certification and quality assurance that automated program analysis methods should be used to find bugs that lead to program security vulnerabilities. The national standard for the secure software development requires the use of source code static analysis tools as one of the measures of software quality assurance at the development stage and the application of dynamic analysis and fuzz-testing of the source code at the qualification testing stage. Fundamental limitations of automated program analysis and testing methods make it impossible to carry out simultaneously exhaustive and precise analysis of programs for errors. Thereof, researches are nowadays carried out aimed at reducing the effect of fundamental limitations on the quality and productivity of automated software error detection methods. This paper discusses an approach that combines methods of source code static analysis and dynamic symbolic execution in order to increase the program error detection efficiency.



Adaptation of General Concepts of Software Testing to Neural Networks
Abstract
The problem of testing and debugging learning neural network systems is discussed. Differences of these systems from program implementations of algorithms from the point of view of testing are noted. Requirements to the testing systems are identified. Specific features of various neural network models from the point of view of selection of the testing technique and determination of tested parameters are analyzed. It is discussed how to get rid of the noted drawbacks of the systems under study. The discussion is illustrated by an example.



Detecting Near Duplicates in Software Documentation
Abstract
Contemporary software documentation is as complicated as the software itself. During its lifecycle, the documentation accumulates a lot of “near duplicate” fragments, i.e. chunks of text that were copied from a single source and were later modified in different ways. Such near duplicates decrease documentation quality and thus hamper its further utilization. At the same time, they are hard to detect manually due to their fuzzy nature. In this paper we give a formal definition of near duplicates and present an algorithm for their detection in software documents. This algorithm is based on the exact software clone detection approach: the software clone detection tool Clone Miner was adapted to detect exact duplicates in documents. Then, our algorithm uses these exact duplicates to construct near ones. We evaluate the proposed algorithm using the documentation of 19 open source and commercial projects. Our evaluation is very comprehensive – it covers various documentation types: design and requirement specifications, programming guides and API documentation, user manuals. Overall, the evaluation shows that all kinds of software documentation contain a significant number of both exact and near duplicates. Next, we report on the performed manual analysis of the detected near duplicates for the Linux Kernel Documentation. We present both quantative and qualitative results of this analysis, demonstrate algorithm strengths and weaknesses, and discuss the benefits of duplicate management in software documents.



Extraction of Data from Mass Media Web Sites
Abstract
To understand the current state and dynamics of the development of the Internet information space, fast tools for extracting data for mass media sites that have a large degree of coverage are needed. However, by no means all sites provide data syndication in the RSS format, and the development of specialized tools for extracting data from each Web site is a costly procedure. In this paper, methods for automatic extraction of news texts from arbitrary mass media sites are proposed. Due to classification of Web page types and the subsequent grouping of their URLs, the quality of extracting news texts is improved. A strategy for traversing a site and detecting the pages containing hyperlinks to news pages is proposed. This strategy decreases the number of requests and reduces the site load.



Applying Time Series for Background User Identification Based on Their Text Data Analysis
Abstract
An approach to user identification based on deviations of their topic trends in operation with text information is presented. An approach is proposed to solve this problem; the approach implies topic analysis of the user’s past trends (behavior) in operation with text content of various (including confidential) categories and forecast of their future behavior. The topic analysis of user’s operation implies determining the principal topics of their text content and calculating their respective weights at the given instants. Deviations in the behavior in the user’s operation with the content from the forecast are used to identify this user. In the framework of this approach, our own original time series forecasting method is proposed based on orthogonal non-negative matrix factorization (ONMF). Note that ONMF has not been used to solve time series forecasting problems before. The experimental research held on the example of real-world corporate emailing formed out of the Enron data set showed the proposed user identification approach to be applicable.



Fine-Grained Address Space Layout Randomization on Program Load
Abstract
Software vulnerabilities are a serious security threat. It is important to develop protection mechanisms preventing their exploitation, especially with a rapid increase of ROP attacks. State of the art protection mechanisms have some drawbacks that can be used by attackers. In this paper, we propose fine-grained address space layout randomization on program load that is able to protect from such kind of attacks. During the static linking stage, the executable and library files are supplemented with information about function boundaries and relocations. A system dynamic linker/loader uses this information to perform permutation of functions. The proposed method was implemented for 64-bit programs on CentOS 7 operating system. The implemented method has shown good resistance to ROP attacks evaluated by two metrics: the number of survived gadgets and the exploitability estimation of ROP chain examples. The implementation presented in this article is applicable across the entire operating system and has no compatibility problems affecting the program performance. The working capacity of proposed approach was demonstrated on real programs. The further research can cover forking randomization and finer granularity than on the function level. It also makes sense to implement the randomization of short functions placement taking into account the relationships between them. The close arrangement of functions that often call each other can improve the performance of individual programs.


