Trend Detection Using NLP as a Mechanism of Decision Support

Polina A. Lobanova; Лобанова Полина Александровна; Ilya F. Kuzminov; Кузьминов Илья Филиппович; Efrosinia Yu. Karatetskaya; Каратецкая Ефросиния Юрьевна; Elizaveta A. Sabidaeva; Сабидаева Елизавета Алексеевна; Vadim V. Anpilogov; Анпилогов Вадим Васильевич

doi:10.14357/20718594220409

Trend Detection Using NLP as a Mechanism of Decision Support

作者: Lobanova P.A.¹, Kuzminov I.F.¹, Karatetskaya E.Y.¹, Sabidaeva E.A.¹, Anpilogov V.V.²
隶属关系:
1. HSE University
2. Sberbank
期: 编号 4 (2022)
页面: 88-98
栏目: Analysis of Textual and Graphical Information
URL: https://journal-vniispk.ru/2071-8594/article/view/270493
DOI: https://doi.org/10.14357/20718594220409
ID: 270493

如何引用文章

详细

The purpose of the article is to present the principles of the developed algorithm for identifying trends based on the analysis of big text data and presenting the result in formats that are convenient for decision makers, implemented in the iFORA Big Data Mining System. The paper provides an overview of existing text analytics algorithms; outlines the mathematical basis for identifying terms that mean trends, which is proposed and tested on dozens of implemented projects; describes approaches to clustering terms based on their vectors in the Word2vec space; provides examples of two key visualizations (semantic, trend maps), which outline the range of topics and trends that characterize a particular area of study, as a way to adapt the results of the analysis to the tasks of decision makers. The limitations and advantages of using the proposed approach for decision support are discussed, and directions for future research are suggested.

关键词

iFORA, NLP, text analytics, decision making

全文:

俄罗斯联邦, Moscow

参考

Pappa, G.L., and A.A. Freitas. 2010. Automating the Design of Data Mining Algorithms. Springer-Verlag Berlin Heidelberg. 187 p.
Yuan, Y., P. Sun, and H. Fan. 2015. Automatic selection and evaluation on data mining algorithms. 6th IEEE International Conference on Software Engineering and Service Science (ICSESS). IEEE. 29–32.
Porter, A.L., and Y. Zhang. 2015. Tech Mining of Science & Technology Information Resources for Future-Oriented Technology Analyses. Futures Research Methodology Version 3.1.
Zhu, D., and A. L. Porter. 2002. Automated extraction and visualization of information for technological intelligence and forecasting. Technological Forecasting and Social Change. 69:495–506.
Osipov, G., I. Smirnov, I. Tikhomirov, I. Sochenkov, A. Shelmanov, and A. Shvets 2014. Information retrieval for R&D support. In Professional search in the modern world. Springer, Cham. 45–69.
Newman, N. C., A. L. Porter, D. Newman, C. C. Trumbach, and S. D. Bolan. 2014. Comparing methods to extract technical content for technological intelligence. Journal of Engineering and Technology Management. 32:97–109.
Tseng, Y.-H., C.-J. Lin, and Y.-I. Lin. 2007. Text mining techniques for patent analysis. Information processing & management. 43(5):1216–1247.
Cooke, P., M.G. Urange, and G. Extebarria. 1997. Regional innovation systems: Institutional and organisational dimensions. Research policy. 26(4-5):475–491.
Kwakkel, J. H., S. Carley, J. Chase, and S. W. Cunningham. 2014. Visualizing geo-spatial data in science, technology and innovation. Technological Forecasting and Social Change. 81:67–81.
Feldman, R., M. Fresko, Y. Kinar, Y. Lindell, O. Liphstar, M. Rajman, Y. Schler, and O. Zamir. 1998. Text mining at the term level. In European Symposium on Principles of Data Mining and Knowledge Discovery. Springer, Berlin, Heidelberg. 65–73.
Averbuch, M., T. Karson, B. Ben-Ami, O. Maimon, and L. Rokach. 2004. Context-sensitive medical information retrieval. In MEDINFO. IOS Press. 282–286.
Osipov, G., I. Smirnov, I. Tikhomirov, I. Sochenkov, and A. Shelmanov. 2016. Exactus expert—search and analytical engine for research and development support. In Novel Applications of Intelligent Systems. Springer, Cham. 269–285.
Church, K. W. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In International Conference on Acoustics, Speech, and Signal Processing. IEEE. 695–698.
Wang, B., S. Liu, K. Ding, Z. Liu and J. Xu. 2014. Identifying technological topics and institution-topic distribution probability for patent competitive intelligence analysis: a case study in LTE technology. Scientometrics. 101(1):685–704.
Frantzi, K.T., S. Ananiadou, and H. Mima. 2000. Automatic recognition of multi-word terms: the C-value/NCvalue method. International journal on digital libraries. 3(2):115–130.
Javed, Z. and H. Afzal. 2014. Biomedical text mining for concept identification from traditional medicine literature. In 2014 International Conference on Open Source Systems & Technologies. IEEE. 206–211.
Rose, S. et al. 2010. Automatic keyword extraction from individual documents. Text mining: applications and theory. 1:1–20.
Salton, G., and C. T. Yu. 1974. On the construction of effective vocabularies for information retrieval. Acm Sigplan Notices. 10(1):48–60.
Liu, C., Y. Sheng, Z. Wei, and Y.-Q. Yang. 2018. Research of text classification based on improved TF-IDF algorithm. In 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE). IEEE. 218–222.
Kutuzov, A., E. Kuzmenko, and L. Pivovarova. 2017. Clustering of Russian adjective-noun constructions using word embeddings. In Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Association for Computational Linguistics.
Kumar, P., and S. Babber. 2013. Information theoretic method of feature selection for text categorization. Int J Math Arch (IJMA). 3(12):2229-5046.
Turney, P. D. 2001. Mining the web for synonyms: PMIIR versus LSA on TOEFL. In European conference on machine learning. Springer, Berlin, Heidelberg. 491–502.
Ahmad, K., and A. E. Davies. 1994. Weirdness in speciallanguage text: Welsh radioactive chemicals texts as an exemplar. Internationales Institut får Terminologieforschung Journal. 5(2):22–52.
Steinhaus, H. 1956. Sur la division des corps materiels en parties. Bull. Acad. Polon. Sci., C1. III vol IV:801–804.
Han, J., and M. Kamber. 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers.
Bae, S., and Y. Yi. 2016. Acceleration of Word2vec Using GPUs. Lecture Notes in Computer Science. 269–279.
Waskom, M. L. 2021. Seaborn: statistical data visualization. Journal of Open Source Software. 6(60):3021.

补充文件

附件文件

动作

1. JATS XML

下载

2. Fig. 1. Semantic map for the direction ‘Scientific and technological trends in world science’, built on the array of scientific articles in the MAG database for 2019-2020

下载 (346KB)

索引源数据

3. Fig. 2. Trend map in the direction ‘Scientific and technological trends in world science’, built on the array of scientific articles in the MAG database for 2019-2020

下载 (399KB)

索引源数据

用户名
密码
记住我

忘记您的密码?	注册

用户名
密码
记住我

忘记您的密码?	注册