Trend Detection Using NLP as a Mechanism of Decision Support
- Authors: Lobanova P.A.1, Kuzminov I.F.1, Karatetskaya E.Y.1, Sabidaeva E.A.1, Anpilogov V.V.2
-
Affiliations:
- HSE University
- Sberbank
- Issue: No 4 (2022)
- Pages: 88-98
- Section: Analysis of Textual and Graphical Information
- URL: https://journal-vniispk.ru/2071-8594/article/view/270493
- DOI: https://doi.org/10.14357/20718594220409
- ID: 270493
Cite item
Full Text
Abstract
The purpose of the article is to present the principles of the developed algorithm for identifying trends based on the analysis of big text data and presenting the result in formats that are convenient for decision makers, implemented in the iFORA Big Data Mining System. The paper provides an overview of existing text analytics algorithms; outlines the mathematical basis for identifying terms that mean trends, which is proposed and tested on dozens of implemented projects; describes approaches to clustering terms based on their vectors in the Word2vec space; provides examples of two key visualizations (semantic, trend maps), which outline the range of topics and trends that characterize a particular area of study, as a way to adapt the results of the analysis to the tasks of decision makers. The limitations and advantages of using the proposed approach for decision support are discussed, and directions for future research are suggested.
Keywords
Full Text

About the authors
Polina A. Lobanova
HSE University
Author for correspondence.
Email: plobanova@hse.ru
Deputy Head of Department, Institute for Statistical Studies and Economics of Knowledge
Russian Federation, MoscowIlya F. Kuzminov
HSE University
Email: ikuzminov@hse.ru
Candidate of Geographical Sciences, Director of Centre, Institute for Statistical Studies and Economics of Knowledge
Russian Federation, MoscowEfrosinia Yu. Karatetskaya
HSE University
Email: ekarateczkaya@hse.ru
Junior Researcher, Institute for Statistical Studies and Economics of Knowledge
Russian Federation, MoscowElizaveta A. Sabidaeva
HSE University
Email: esabidaeva@hse.ru
Research Assistant, Institute for Statistical Studies and Economics of Knowledge
Russian Federation, MoscowVadim V. Anpilogov
Sberbank
Email: anpilogov.v.v@sberbank.ru
Head of Validation Unit, Centre for Validation of Corporate-Investment Business Models
Russian Federation, MoscowReferences
- Pappa, G.L., and A.A. Freitas. 2010. Automating the Design of Data Mining Algorithms. Springer-Verlag Berlin Heidelberg. 187 p.
- Yuan, Y., P. Sun, and H. Fan. 2015. Automatic selection and evaluation on data mining algorithms. 6th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE. 29–32.
- Porter, A.L., and Y. Zhang. 2015. Tech Mining of Science & Technology Information Resources for Future-Oriented Technology Analyses. Futures Research Methodology Version 3.1.
- Zhu, D., and A. L. Porter. 2002. Automated extraction and visualization of information for technological intelligence and forecasting. Technological Forecasting and Social Change. 69:495–506.
- Osipov, G., I. Smirnov, I. Tikhomirov, I. Sochenkov, A. Shelmanov, and A. Shvets 2014. Information retrieval for R&D support. In Professional search in the modern world. Springer, Cham. 45–69.
- Newman, N. C., A. L. Porter, D. Newman, C. C. Trumbach, and S. D. Bolan. 2014. Comparing methods to extract technical content for technological intelligence. Journal of Engineering and Technology Management. 32:97–109.
- Tseng, Y.-H., C.-J. Lin, and Y.-I. Lin. 2007. Text mining techniques for patent analysis. Information processing & management. 43(5):1216–1247.
- Cooke, P., M.G. Urange, and G. Extebarria. 1997. Regional innovation systems: Institutional and organisational dimensions. Research policy. 26(4-5):475–491.
- Kwakkel, J. H., S. Carley, J. Chase, and S. W. Cunningham. 2014. Visualizing geo-spatial data in science, technology and innovation. Technological Forecasting and Social Change. 81:67–81.
- Feldman, R., M. Fresko, Y. Kinar, Y. Lindell, O. Liphstar, M. Rajman, Y. Schler, and O. Zamir. 1998. Text mining at the term level. In European Symposium on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg. 65–73.
- Averbuch, M., T. Karson, B. Ben-Ami, O. Maimon, and L. Rokach. 2004. Context-sensitive medical information retrieval. In MEDINFO. IOS Press. 282–286.
- Osipov, G., I. Smirnov, I. Tikhomirov, I. Sochenkov, and A. Shelmanov. 2016. Exactus expert—search and analytical engine for research and development support. In Novel Applications of Intelligent Systems. Springer, Cham. 269–285.
- Church, K. W. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In International Conference on Acoustics, Speech, and Signal Processing. IEEE. 695–698.
- Wang, B., S. Liu, K. Ding, Z. Liu and J. Xu. 2014. Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: a case study in LTE technology. Scientometrics. 101(1):685–704.
- Frantzi, K.T., S. Ananiadou, and H. Mima. 2000. Automatic recognition of multi-word terms: the C-value/NCvalue method. International journal on digital libraries. 3(2):115–130.
- Javed, Z. and H. Afzal. 2014. Biomedical text mining for concept identification from traditional medicine literature. In 2014 International Conference on Open Source Systems & Technologies. IEEE. 206–211.
- Rose, S. et al. 2010. Automatic keyword extraction from individual documents. Text mining: applications and theory. 1:1–20.
- Salton, G., and C. T. Yu. 1974. On the construction of effective vocabularies for information retrieval. Acm Sigplan Notices. 10(1):48–60.
- Liu, C., Y. Sheng, Z. Wei, and Y.-Q. Yang. 2018. Research of text classification based on improved TF-IDF algorithm. In 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE). IEEE. 218–222.
- Kutuzov, A., E. Kuzmenko, and L. Pivovarova. 2017. Clustering of Russian adjective-noun constructions using word embeddings. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Association for Computational Linguistics.
- Kumar, P., and S. Babber. 2013. Information theoretic method of feature selection for text categorization. Int J Math Arch (IJMA). 3(12):2229-5046.
- Turney, P. D. 2001. Mining the web for synonyms: PMIIR versus LSA on TOEFL. In European conference on machine learning. Springer, Berlin, Heidelberg. 491–502.
- Ahmad, K., and A. E. Davies. 1994. Weirdness in speciallanguage text: Welsh radioactive chemicals texts as an exemplar. Internationales Institut får Terminologieforschung Journal. 5(2):22–52.
- Steinhaus, H. 1956. Sur la division des corps materiels en parties. Bull. Acad. Polon. Sci., C1. III vol IV:801–804.
- Han, J., and M. Kamber. 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers.
- Bae, S., and Y. Yi. 2016. Acceleration of Word2vec Using GPUs. Lecture Notes in Computer Science. 269–279.
- Waskom, M. L. 2021. Seaborn: statistical data visualization. Journal of Open Source Software. 6(60):3021.
Supplementary files
