Morphological Guesser as a Tool for Analyzing Field Data: Experiences with The Naukan Yupik Language
- Authors: Budyanskaya E.M.1, Buzanov A.O.1,2, Zhornik D.O.1, Pikhtin A.A.1,2
-
Affiliations:
- Institute of Linguistics of the RAS
- High School of Economy
- Issue: No 2 (2025)
- Pages: 9-19
- Section: LINGUISTICS
- URL: https://journal-vniispk.ru/2307-6119/article/view/296273
- DOI: https://doi.org/10.23951/2307-6119-2025-2-9-19
- ID: 296273
Cite item
Full Text
Abstract
The paper presents the development and evaluation of two automated morphological analysis tools for Naukan Yupik (Yupik Eskimo Eskimo-Aleut): a dictionary-based morphological analyzer and a dictionary-free morphological guesser. Both tools are implemented with a two-stage approach to morphological modeling based on finite state automata. The study examines in detail the morphological features of Naukan Yupik that influence the development of automated analysis tools, including rich inflection and derivation, homonymy of morphological markers, and complex morphophonological processes. The effectiveness of both tools will be evaluated using a corpus of oral texts from 2022–2023. Particular attention is paid to the problem of overgeneration in the output of the morphological guesser and to ways of solving this problem through part-of-speech-based analysis separation. The results show that when working with field data, the use of a guesser can be more effective despite its known limitations.
About the authors
Elena Mikhailovna Budyanskaya
Institute of Linguistics of the RAS
Author for correspondence.
Email: budyanskaya.lena@gmail.com
Moscow, Russia
Anton Olegovich Buzanov
Institute of Linguistics of the RAS; High School of Economy
Email: anton.buzanov.00@gmail.com
Москва, Россия
Daria Olegovna Zhornik
Institute of Linguistics of the RAS
Email: daria.zhornik@yandex.ru
Moscow, Russia
Andrey Andreevich Pikhtin
Institute of Linguistics of the RAS; High School of Economy
Email: p_nafanyka@gmail.com
Moscow, Russia
References
- Menovschikov G.A. Yazyk naukanskikh eskimosov [The language of Naukan eskimos]. Leningrad., Nauka, 1975. 512 p. (in Russian).
- Golovko E.V., Dobrieva E.A., Jacobson S., Krauss M. Slovar’ yazyaka naukanskikh eskimosov [Naukan yupik eskimo dictionary]. Fairbanks, Alaska native languages center, 2004. 369 p. (in Russian).
- Vakhtin N.B. Morfologiya glagol’nogo slovoizmeneniya s yupikskikh (eskimosskikh) yazykakh [Inflectional morphology in yupik (eskimo) languages]. Rossiyskaya akademiya nauk, Institut lingvisticheskikh issledovaniy. Saint Petersburg, Nestor, 2007. 123 p. (in Russian).
- Kanuparthi N., Inumella A., Sharma D.M. Hindi Derivational Morphological Analyzer. In: Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology. Montreal, Association for Computational Linguistics, 2012. Pp. 10–16.
- Kessikbayeva G., Cicekli I. Rule Based Morphological Analyzer of Kazakh Language. In: Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM. Baltimore, Association for Computational Linguistics, 2014. Pp. 46–54.
- Khalifa S., Hassan S., Habash N. A Morphological Analyzer for Gulf Arabic Verbs. In: Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 2017. Pp. 35–45.
- Forbes C., Nicolai G., Silfverberg M. An FST morphological analyzer for the Gitksan language. In: Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology. Online: Association for Computational Linguistics, 2021. Pp. 188–197.
- Merzhevich T., Ferraz Gerardi F. Introducing YakuToolkit. Yakut Treebank and Morphological Analyzer. In: Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages. Marseille, European Language Resources Association, 2022. Pp. 185–188.
- Koskenniemi K. Two-level Morphology. A General Computational Model for Word-Form Recognition and Produc-tion. Helsinki, University of Helsinki, Department of General Linguistics, 1983.
- Karttunen L. KIMMO: A General Morphological Processor. Texas Linguistics Forum. 1983. Vol. 22. Pp. 217–228.
- Antworth E.L. PC-KIMMO: a two-level processor for morphological analysis. Dallas, Summer Institute of Linguistics, 1990.
- Ritchie G.D., Russell G.J., Black A.W., Pulman S.G. Computational Morphology. Practical Mechanisms for the English Lexicon. Cambridge, The MIT Press, 1991.
- Swanson D., Howell N. Lexd: A finite-state lexicon compiler for non-suffixational morphologies. Multilingual Facilitation. 2021. Pp. 133–146.
- Karttunen L., Beesley K. R. Two-level rule compiler. Palo Alto, Xerox Corporation, Palo Alto Research Center, 1992.
- Lindén K., Axelson E., Hardwick S., Pirinen T. A., Silfverberg M. HFST—framework for compiling and applying morphologies. In: Systems and Frameworks for Computational Morphology: Second International Workshop, SFCM 2011. Berlin, Springer, 2011. Pp. 67–85.
- Chen E., Schwartz L. A morphological analyzer for St. Lawrence island / Central Siberian yupik. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018.
Supplementary files
