Natural Language Processing and Fiction Text: Basis for Corpus Research
- 作者: Gorozhanov A.I.1, Guseynova I.A.1, Stepanova D.V.2
-
隶属关系:
- Moscow State Linguistic University
- Minsk State Linguistic University
- 期: 卷 15, 编号 1 (2024)
- 页面: 195-210
- 栏目: DISCOURSE & CORPUS STUDIES
- URL: https://journal-vniispk.ru/2313-2299/article/view/323648
- DOI: https://doi.org/10.22363/2313-2299-2024-15-1-195-210
- EDN: https://elibrary.ru/FKVAOI
- ID: 323648
如何引用文章
全文:
详细
The study deals with NLP procedures on the material of the fiction texts in German and in English, which are considered as strong cultural texts. The aim of the study is to develop a model of such a technical device to process, analyze and interpret a fiction text, which would reveal the full potential of popular NLP tools within the corpus approach. The general methods used in the study are analysis and synthesis. Special methods are additionally used to solve certain specific issues: descriptive method, modelling and qualitative and quantitative analysis. The scientific novelty lies in the fact that the authors apply the crucial principles of the classical theories of text interpretation according to the latest methods and tools of the applied linguistics. As a practical result, special software has been developed, which is able to process SQL based linguistic corpora, automatically built with spaCy NLP library and Python programming language. This software can be used for a fiction text interpretation, as well as for compiling learning materials in Home Reading. It is assumed that the development of special software for strong cultural texts stimulates the search for scientific solutions and at the same time allows one to understand the essential differences that exist between natural and artificial intelligence.
作者简介
Alexey Gorozhanov
Moscow State Linguistic University
编辑信件的主要联系方式.
Email: a_gorozhanov@mail.ru
ORCID iD: 0000-0003-2280-1282
SPIN 代码: 1753-4920
Dr.Sc. in Philology, Associate Professor, Professor of the Department of Grammar and History of the German Language, Faculty of German Language
38, b. 1, Ostozhenka str., Moscow, Russian Federation, 119034Innara Guseynova
Moscow State Linguistic University
Email: guseynova@linguanet.ru
ORCID iD: 0000-0002-6544-699X
SPIN 代码: 1635-5260
Dr.Sc. in Philology, Associate Professor, Vice-Rector
38, b. 1, Ostozhenka str., Moscow, Russian Federation, 119034Darya Stepanova
Minsk State Linguistic University
Email: daryastepanova79@gmail.com
ORCID iD: 0000-0002-2857-4386
SPIN 代码: 5291-8660
PhD in Philology, Associate Professor
21, Zakharova, Minsk, Belarus, 220034参考
- Tsujii, J. (2021). Natural language processing and computational linguistics. Computational Linguistics, 47(4), 707-727. https://doi.org/10.1162/COLI_a_00420
- O’Neill, H., Welsh, A., Smith, D.A., Roe, G. & Terras, M. (2021). Text mining mill: Computationally detecting influence in the writings of John Stuart Mill from library records. Digital Scholarship in the Humanities, 36(4), 1013-1029. https://doi.org/10.1093/llc/fqab010
- Fonseca, C.A., Guelpeli, M.V.C. & De Souza Netto, R.S. (2021). Representation of structured data of the text genre as a technique for automatic text processing. Texto Livre, 15. https://doi.org/10.35699/1983-3652.2022.35445
- Szabó, M.K., Ring, O., Nagy, B., Kiss, L., Koltai, J., Berend, G. & Kmetty, Z. (2020). Exploring the dynamic changes of key concepts of the Hungarian socialist era with natural language processing methods. Historical Methods, 54(1), 1-13. https://doi.org/10.1080/0161 5440.2020.1823289
- Malyuga, E.N. & McCarthy, M. (2021). “No” and “net” as response tokens in English and Russian business discourse: In search of a functional equivalence. Russian Journal of Linguistics, 25(2), 391-416. https://doi.org/10.22363/2687-0088-2021-25-2-391-416
- Gorozhanov, A.I. & Guseynova, I.A. (2020). Corpus analysis of the grammatical categories’ constituents in fiction texts considering the linguo-regional component. Journal of Siberian Federal University. Humanities & Social Sciences, 13(12), 2035-2048. https://doi.org/10.17516/1997-1370-0702. (In German).
- Denisova, G.V. (2020). Intertekst v sovremennoj sociokul’turnoj real’nosti Rossii i Italii. Moscow: Kanon+. P. 272. (In Russ.).
- Milne, P.W. (2022). Praescriptum: Kafka’s two bodies. Philosophy Today, 66(3), 587-603. https://doi.org/10.5840/philtoday2022324451
- Itkin, A. (2021). Kafka’s worlds. German Quarterly, 94(4), 493-508. https://doi.org/10.1111/gequ.12241
- Roca, J.B. & Rius, N.I. (2020). Kafka and disease. between reality and writing [Kafka y la enfermedad. Entre la realidad y la escritura] Revista Chilena De Literatura, 102, 233-247. https://doi.org/10.4067/S0718-22952020000200223
- Logue, M. (2022). Patrick MacGill: A path to socialism shared with Jack London. [Patrick MacGill: el Camino hacia el Socialismo junto a Jack London]. Estudios Irlandeses, 17, 54-64. https://doi.org/10.24162/EI2022-10645
- Hernandez, A. (2021). Jack London’s poetic animality and the problem of domestication. Journal of Modern Literature, 45(1), 40-55. https://doi.org/10.2979/jmodelite.45.1.03
- López, J.I.G. (2020). Jack London, the socialist dream of a young poet. Revista De Estudios Norteamericanos, 24, 9-112. https://doi.org/10.12795/REN.2020.I24.05
- Li, J., Lian, Z., Wu, Z., Zeng, L., Mu, L., yuan, y. & ye, J. (2023). Artificial intelligence- based method for the rapid detection of fish parasites (ichthyophthirius multifiliis, gyrodactylus kobayashii, and argulus japonicus). Aquaculture, 563. https://doi.org/10.1016/j.aquaculture.2022.738790
- Hachemi, A. & Zeroual, A. (2022). Computer-assisted program for water calco-carbonic equilibrium computation. Earth Science Informatics, 15(1), 68-704. https://doi.org/10.1007/ s12145-021-00703-5
- Li, W., Pu, H., & Wang, R. (2021). Sign language recognition based on computer vision. In: Priceeding of 2021 IEEE International Conference on Artificial Intelligence and Computer Applications, ICAICA 2021. pp. 919-922. https://doi.org/10.1109/ICAICA52286.2021.9498024
- Schmitt, X., Kubler, S., Robert, J., Papadakis, M. & Letraon, y. (2019). A replicable comparison study of NER software: StanfordNLP, NLTK, OpenNLP, SpaCy, gate. In: Priceeding of 2019 6th International Conference on Social Networks Analysis, Management and Security, SNAMS 2019. pp. 338-343. https://doi.org/10.1109/SNAMS.2019.8931850
- Ajani, D.T. (2019). Grammatico-Semantic Content of Primitives in the Major Themes of News Watch’s Reports on Nigerian Politics. The international journal of humanities & social studies, 7(12), 327-337. https://doi.org/10.24940/theijhss/2019/v7/i12/HS1912-066
- Kraeva, I.A. et al. (2022). Germanistika i lingvodidaktika v Moskovskom i Minskom gosudarstvennykh lingvisticheskikh universitetakh: Istoki, razvitie, perspektivy. (In Russ.).
- Potapova, R.K. (2012). Diskursivnaya sostavlyayushchaya sovremennoi korpusnoi lingvistiki (primenitel’no k ustno-rechevym bazam dannykh). Bulletin of Moscow State Linguistic University, 639, 157-167. (In Russ.).
- Zubov, A.V. (2006). Korpusnaya lingvistika: vozmozhnosti i perspektivy. In: Proceedings of Conference “Russkii yazyk: Sistema i funktsionirovanie”, Minsk. pp. 22-27. (In Russ.).
- Kim, C., Choi, S., Jeong, J. & Lee, E. (2022). Automatic risks detection and comparison techniques for general conditions of technical documents in purchasing order. In: Proceedings of ACM International Conference Proceeding Series. pp. 236-241. https://doi.org/10.1145/3543712.3543721
- Fantechi, A., Gnesi, S., Livi, S. & Semini, L. (2021). A spaCy-based tool for extracting variability from NL requirements. In: Priceeding of ACM International Conference Proceeding Series, Part F171625-B. pp. 32-35. https://doi.org/10.1145/3461002.3473074
- Eyre, H., Chapman, A.B., Peterson, K.S., Shi, J., Alba, P.R., Jones, M.M. & Patterson, O.V. (2021). Launching into clinical space with medspaCy: A new clinical text processing toolkit in Python. In: Proceedings AMIA … Annual Symposium Proceedings. AMIA Symposium, 2021. pp. 438-447.
- Partalidou, E., Spyromitros-Xioufis, E., Doropoulos, S., Vologiannidis, S. & Diamantaras, K.I. (2019). Design and implementation of an open source Greek POS tagger and entity recognizer using spaCy. In: Proceedings 2019 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2019. pp. 337-341. https://doi.org/10.1145/3350546.3352543
- Jugran, S., Kumar, A., Tyagi, B.S. & Anand, V. (2021). Extractive automatic text summarization using SpaCy in Python NLP. In: Proceedings of 2021 International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2021. pp. 582-585. https://doi.org/10.1109/ICACITE51222.2021.9404712
- Channabasamma, Suresh, y. & Manusha Reddy, A. (2021). A contextual model for information extraction in resume analytics using NLP’s spaCy. Inventive computation and information technologies. Springer. pp. 395-404. https://doi. org/10.1007/978-981-33-4305-4_30
- Harahus, M., Juhar, J. & Hladek, D. (2022). Morphological annotation of the Slovak language in the spaCy library with the pretraining. In: Proceedings of 32nd International Conference Radioelektronika, Radioelektronika 2022. https://doi.org/10.1109/RADIOELEKTRONI KA54537.2022.9764935
- Kumar, D., Choudhari, K., Patel, P., Pandey, S., Hajare, A. & Jante, S. (2022). STAT simple text annotation tool (STAT): Web-based tool for creating training data for spaCy models. In: ICT Analysis and Applications. Singapore: Springer Nature. https://doi.org/10.1007/978-981-16-5655-2_29
- Soni, P.K. & Rambola, R. (2021). Deep learning, WordNet, and spaCy based hybrid method for detection of implicit aspects for sentiment analysis. In: Proceedings of 2021 International Conference on Intelligent Technologies, CONIT 2021. https://doi.org/10.1109/CONIT51480.2021.9498372
- Chantrapornchai, C. & Tunsakul, A. (2021). Information extraction on tourism domain using spaCy and BERT. ECTI Transactions on Computer and Information Technology, 15(1), 108- 122. https://doi.org/10.37936/ecti-cit.2021151.228621
- Singh, N. & Hussain, A. (2022). Rapid application development in cloud computing with IoT. In: IoT and AI technologies for sustainable living: A practical handbook. pp. 1-28. https://doi.org/10.1201/9781003051022-1
- Gorozhanov, A.I., Guseynova, I.A. & Stepanova, D.V. (2022). Instrumentarii avtomatizirovannogo analiza perevoda khudozhestvennogo proizvedeniya. Issues of Applied Linguistics, 45, 62-89. https://doi.org/10.25076/vpl.45.03 (In Russ.).
- Gorozhanov, A.I. (2021). Metod komparativnogo analiza gruppy tekstov (na materiale nemetskoyazychnykh nauchnykh statei. Bulletin of Moscow State Linguistic University, 5(847), 48-59. https://doi.org/10.52070/2542-2197_2021_5_847_48 (In Russ.).
- Singh, N., Kumar, M., Singh, B. & Singh, J. (2022). DeepSpacy-NER: An efficient deep learning model for named entity recognition for Punjabi language. Evolving Systems, 14, 673-683. https://doi.org/10.1007/s12530-022-09453-1
补充文件
