Foreword to the Special Issue on Uralic Languages

Main Article Content

Tommi A Pirinen
Trond Trosterud
Francis M. Tyers
Veronika Vincze
Eszter Simon
Jack Rueter


In this introduction we have tried to present concisely the history of language technology for Uralic languages up until today, and a bit of a desiderata from the point of view of why we organised this special issue. It is of course not possible to cover everything that has happened in a short introduction like this. We have attempted to cover the beginnings of the (Uralic) language-technology scene in 1980’s as far as it’s relevant to much of the current work, including the ones presented in this issue. We also go through the Uralic area by the main languages to survey on existing resources, to also form a systematic overview of what is missing. Finally we talk about some possible future directions on the pan-Uralic level of language technology management.

Article Details

Author Biographies

Tommi A Pirinen, Hamburger Zentrum für Sprachkorpora, Universität Hamburg

Trond Trosterud, HSL-fakultehta, UiT Norgga árktalaš universitehta



Veronika Vincze, MTA-SZTE, Szegedi Tudomány Egyetem


Kimmo Koskenniemi. Two-level morphology. PhD thesis, 1983.

Fred Karlsson. Constraint grammar as a framework for parsing running text. In Proceedings of the 13th conference on Computational linguistics-Volume 3, pages 168–173. Association for Computational Linguistics, 1990. DOI:

Mathias Creutz and Krista Lagus. Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Helsinki University of Technology, 2005.

Kimmo Koskenniemi. How to build an open source morphological parser now. Resourceful Language Technology, page 86, 2008.

Tommi A Pirinen. Development and use of computational morphology of finnish in the open source and open science era: Notes on experiences with omorfi development. 28:381—393, 2015.

Aarne Ranta. Grammatical framework. Journal of Functional Programming, 14(02):145–189, 2004. DOI:

Eszter Simon, Piroska Lendvai, Géza Németh, Gábor Olaszy, and Klára Vicsi. The hungarian language in the digital age – a magyar nyelv a digitális korban, 2012.

Katalin É. Kiss. Introduction. In Katalin É. Kiss, editor, The Evolution of Functional Left Peripheries in Hungarian Syntax, pages 1–8. Oxford University Press, 2014. DOI: 10.1093/acprof:oso/9780198709855.003.0001 DOI:

Dóra Csendes, János Csirik, Tibor Gyimóthy, and András Kocsor. The szeged treebank. In Václav et al. Matoušek, editor, Proceedings of the 8th International Conference on Text, Speech and Dialogue (TSD 2005), pages 123–131. Springer, 2005. DOI:

Viktor Trón, Gyögy Gyepesi, Péter Halácsy, András Kornai, László Németh, and Dániel Varga. Hunmorph: Open source word analysis. In Proceedings of the ACL Workshop on Software, pages 77–85. Association for Computational Linguistics, 2005. DOI:

Attila Novák. Milyen a jó humor? [what is good humor like?]. In Proceedings of the 1st Hungarian Computational Linguistics Conference, SZTE, Szeged, page 138–144.

Joakim Nivre. Towards a universal grammar for natural language processing. In A. Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, page 3–16. Springer. DOI:

Gábor Recski and Dániel Varga. A hungarian np chunker. In The Odd Yearbook. ELTE SEAS Undergraduate Papers in Linguistics, pages 87–93. ELTE School of English and American Studies, 2009.

Eszter Simon. Approaches to Hungarian Named Entity Recognition. PhD thesis, BME, 2013.

Richárd Farkas, Veronika Vincze, and Helmut Schmid. Dependency parsing of hungarian: Baseline results and challenges. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pages 55–65.

Siim Orasmaa, Timo Petmanson, Alexander Tkachenko, Sven Laur, and Heiki-Jaan Kaalep. Estnltk - nlp toolkit for estonian. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France, may 2016. European Language Resources Association (ELRA).

Pekka Sammallahti. The Saami Languages. An Introduction. Davvi Girji, 1998. [18] Lene Antonsen, Trond Trosterud, and Linda Wiechetek. Reusing Grammatical Resources for New Languages. In Proceedings of LREC-2010, Valetta, Malta, 2010. ELRA.

Lene. 2013 Antonsen. Čállinmeattáhusaid guorran. Sámi dieđalaš áigečála, (2):7—32, 2013.

Antonsen Lene, Saara Huhmarniemi, and Trond Trosterud. Constraint grammar in dialogue systems. In NEALT Proceedings Series, volume 8, pages 31–21, 2009.

Ryan Johnson, Lene Antonsen, and Trond Trosterud. Using finite state transducers for making efficient reading comprehension dictionaries. In NEALT Proceedings Series, editor, Proceedings of the 19th Nordic Conference of Computational Linguistics, volume 16, pages 59—71, 2013.

Saara Huhmarniemi, Sjur Moshagen, and Trond Trosterud. Usage of xsl stylesheets for the annotation of the sámi language corpora. In Proceedings of the Linguistic Annotation Workshop, pages 45—48, Morristown, NJ, USA, 2007. Association for Computational Linguistics. DOI:

Sjur Moshagen, Jack Rueter, Tommi Pirinen, Trond Trosterud, and Francis M. Tyers. Open-source infrastructures for collaborative work on under-resourced languages. In Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era, LREC, pages 71–77, Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era.