Stagger: an Open-Source Part of Speech Tagger for Swedish

Main Article Content

Robert Östling


This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.

Article Details



Baum, Leonard E. 1972. An inequality and associated maximization technique in statistical estimation for probabilistic functions of a markov process. Inequalities 3:1–8.

Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research 3:1137–1155.

Bengio, Yoshua, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In 26th Annual International Conference on Machine Learning, ICML 2009, pages 41–48. Montreal, Canada.

Berger, Adam L., Vincent J. Della Pietra, and Stephen A. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational Linguistics 22:39–71.

Borin, Lars and Markus Forsberg. 2009. All in the family: A comparison of SALDO and WordNet. In Nodalida 2009 Workshop on WordNets and other Lexical Semantic Resources – between Lexical Semantics, Lexicography, Terminology and Formal Ontologies, pages 7–12. Odense, Denmark.

Brants, Thorsten. 2000. TnT – A Statistical Part-of-Speech Tagger. In 6th Applied Natural Language Processing Conference, pages 224–231. Seattle, WA, USA.

Brill, Eric. 1995. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21:543–565.

Brown, Peter F., Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Computational Linguistics 18:467–479.

Carlberger, Johan and Viggo Kann. 1999. Implementing an efficient part-of-speech tagger. Software–Practice and Experience 29:815–832.

Collins, Michael. 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, pages 1–8. Philadelphia, PA, USA.

Collobert, Ronan and Jason Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In 25th international conference on Machine learning, ICML 2008, pages 160–167. Helsinki, Finland.

Collobert, Ronan, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12:2493–2537.

Daelemans, Walter, Jakub Zavrel, Ko van der Sloot, and Antal van den Bosch. 2001. Timbl: Tilburg memory-based learner version 4.0. reference guide. Tech. rep., ILK.

Ejerhed, E., G. Källgren, O. Wennstedt, and M. Åström. 1992. The linguistic annotation system of the stockholm-umeå project. Tech. rep., Department of Linguistics, University of Umeå.

Elworthy, David. 1994. Does Baum-Welch re-estimation help taggers? In Fourth conference on applied natural language processing, ANLC 1994, pages 53–58. Stuttgart, Germany.

Forsbom, Eva. 2008. Good tag hunting: Tagability of granska tags. In B. M. Joakim Nivre, Mats Dahllöf, ed., Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein, pages 77–85. Acta Universitatis Upsaliensis.

Forsbom, Eva and Kenneth Wilhelmsson. 2010. Revision of part-of-speech tagging in stockholm umeå corpus 2.0. In Swedish Language Technology Conference, SLTC 2010.

Giménez, Jesús and Lluís Màrquez. 2003. Fast and accurate part-of-speech tagging: The svm approach revisited. In Recent Advances in Natural Language Processing, RANLP 2003, pages 153–163. Borovets, Bulgaria.

Huang, Fei and Alexander Yates. 2009. Distributional representations for handling sparsity in supervised sequence-labeling. In Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing, ACL 2009, pages 495–503. Singapore.

Källgren, Gunnel. 1996. Linguistic indeterminacy as a source of errors in tagging. In Proceedings of the 16th conference on Computational linguistics, COLING 1996, pages 676–680. Copenhagen, Denmark.

Källgren, Gunnel. 2006. Documentation of the Stockholm Umeå Corpus. In S. Gustafson-Capková and B. Hartmann, eds., Manual of the Stockholm Umeå Corpus version 2.0, pages 5–85. Department of Linguistics, Stockholm University.

Lafferty, John D., Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Eighteenth International Conference on Machine Learning, ICML 2001, pages 282–289. San Francisco, CA, USA.

Lavergne, Thomas, Olivier Cappé, and François Yvon. 2010. Practical very large scale crfs. In 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pages 504–513. Uppsala, Sweden.

Loftsson, Hrafn and Robert Östling. 2013. Tagging a morphologically complex language using an averaged perceptron tagger: The case of icelandic. In 19th Nordic Conference on Computational Linguistics, NoDaLiDa 2013, pages 105–119. Oslo, Norway.

Megyesi, Beata. 2001. Comparing data-driven learning algorithms for pos tagging of swedish. In Conference on Empirical Methods in Natural Language Processing, EMNLP 2001, pages 151–158. Carnegie Mellon University, Pittsburgh, PA, USA.

Mnih, Andriy and Geoffrey Hinton. 2007. Three new graphical models for statistical language modelling. In 24th international conference on Machine learning, ICML 2007, pages 641–648. Corvallis, OR, USA.

Ngai, G. and R. Florian. 2001. Transformation-based learning in the fast lane. In Second Meeting of the North American Chapter of the Association for Computational Linguistics, NAACL 2001, pages 40–47. Pittsburgh, PA, US.

Östling, Robert. 2012. Stagger: A modern POS tagger for Swedish. In Fourth Swedish Language Technology Conference, SLTC 2012, pages 83–84. Lund, Sweden.

Ratnaparkhi, Adwait. 1996. A maximum entropy model for part-of-speech tagging. In Conference on Empirical Methods in Natural Language Processing, EMNLP 1996, pages 133–142. Philadelphia, PA, USA.

Schmid, Helmut. 1994. Probabilistic part-of-speech tagging using decision trees. In International Conference on New Methods in Language Processing, pages 44–49. Manchester, UK.

Shen, Libin, Giorgio Satta, and Aravind Joshi. 2007. Guided learning for bidirectional sequence classification. In 45th Annual Meeting of the Association of Computational Linguistics, ACL 2007, pages 760–767. Prague, Czech Republic.

Sjöbergh, Jonas. 2003a. Combining pos-taggers for improved accuracy on swedish text. In 14th Nordic Conference of Computational Linguistics, NoDaLiDa 2003. Reykjavik, Iceland.

Sjöbergh, Jonas. 2003b. Stomp, a pos-tagger with a different view. In Recent Advances in Natural Language Processing Conference, RANLP 2003, pages 54–60.

Søgaard, Anders. 2011. Semisupervised condensed nearest neighbor for part-of-speech tagging. In 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, pages 48–52. Portland, OR, USA.

Spoustová, Drahomíra, Jan Haji£, Jan Raab, and Miroslav Spousta. 2009. Semisupervised training for the averaged perceptron pos tagger. In 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009, pages 763–771. Athens, Greece.

Subramanya, Amarnag, Slav Petrov, and Fernando Pereira. 2010. Efficient graph-based semi-supervised learning of structured tagging models. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, pages 167–176. Cambridge, MA, USA.

Suzuki, Jun and Hideki Isozaki. 2008. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In Annual Meeting of the Association for Computational Linguistics: Human Language Technology, ACL-HLT 2008, pages 665–673. Columbus, OH, USA.

Toutanova, Kristina, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, pages 173–180. Edmonton, Canada.

Tsuruoka, Yoshimasa, Yusuke Miyao, and Jun'ichi Kazama. 2011. Learning with lookahead: can history-based models rival globally optimized models? In Fifteenth Conference on Computational Natural Language Learning, CoNLL 2011, pages 238–246. Portland, OR, USA. Turian,

Joseph, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pages 384–394. Uppsala, Sweden.