Investigations of Synonym Replacement for Swedish
Abstract
We present results from an investigation on automatic synonym replacement for Swedish. Three different methods for choosing alternative synonyms were evaluated: (1) based on word frequency, (2) based on word length, and (3) based on level of synonymy. These three strategies were evaluated in terms of standardized readability metrics for Swedish, average word length, proportion of long words, and in relation to the ratio of errors in relation to replacements. The results show an improvement in readability for most strategies, but also show that erroneous substitutions are frequent.
References
Björnsson, C.H. 1968. Läsbarhet. Stockholm: Liber.
Blake, Catherine, Julia Kampov, Andreas K Orphanides, David West, and Cory Lown. 2007. UNC-CH at DUC 2007: Query Expansion, Lexical Simplication and Sentence Selection Strategies for Multi-Document Summarization. Proceedings of Document Understanding Conference (DUC) Workshop 2007.
Bolshakov, Igor A. and Alexander Gelbukh. 2004. Synonymous Paraphrasing Using Word-Net and Internet. Natural Language Processing and Information Systems pages 189–200.
Borin, Lars and Marcus Forsberg. 2009. All in the family: A comparison of SALDO and WordNet. In Proceedings of the Nodalida 2009 Workshop on WordNets and other Lexical Semantic Resources - between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. NEALT Proceedings Series.
Carroll, John, Guido Minnen, Yvonne Canning, Siobhan Devlin, and John Tait. 1998. Practical simplication of English newspaper text to assist aphasic readers. In Proceedings of the AAAI98 Workshop on Integrating Articial Intelligence and Assistive Technology, vol. 1, pages 7–10. Citeseer.
Carroll, John, Guido Minnen, Darren Pearce, Yvonne Canning, Siobhan Devlin, and John Tait. 1999. Simplifying Text for Language-Impaired Readers. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 269–270.
Delsing, Lars-Olof and Katarina Lundin-Åkesson. 2005. Håller språket ihop Norden? : en forskningsrapport om ungdomars förståelse av danska, svenska och norska. TemaNord. Nordiska ministerrådet.
Domeij, Rickard, Ola Knutsson, Johan Carlberger, and Viggo Kann. 2000. Granska - An ecient hybrid system for Swedish grammar checking. In Proceedings of the 12th Nordic Conference in Computational Linguistics, Nodalida-99.
Falkenjack, Johan and Katarina Heimann Mühlenbock. 2012. Using the probability of readability to order Swedish texts. In Proceedings of the Fourth Swedish Language Technology Conference, Lund, Sweden.
Kandula, Sasikiran, Dorothy Curtis, and Qing Zeng-Treitler. 2010. A Semantic and Syntactic Text Simplication Tool for Health Content. AMIA Annual Symposium proceedings AMIA Symposium AMIA Symposium pages 366–370.
Kann, Viggo and Magnus Rosell. 2005. Free Construction of a Free Swedish Dictionary of Synonyms. In NoDaLiDa 2005 , pages 1–6. QC 20100806.
Keskisärkkä, Robin. 2012. Automatic Text Simplication via Synonym Replacement. Master's thesis, Linköping University, Department of Computer and Information Science.
Lal, Patha and Stefan Rüger. 2002. Extract-based summarization with simplication. In Proceedings of the ACL.
Miller, George A. 1995. WordNet: A Lexical Database for English. Communications of the ACM 38:39–41. DOI: 10.1145/219717.219748
Mühlenbock, Katarina and Soe Johansson Kokkinakis. 2009. LIX 68 revisited – An extended readability measure. In Proceedings of Corpus Linguistics.
Siddharthan, Advaith and Ann Copestake. 2002. Generating Anaphora for Simplifying Text. In Proceedings of the 4th Discourse Anaphora and Anaphor Resolution Colloquium DAARC 2002 , pages 199–204.
Smith, Christian and Arne Jönsson. 2011. Automatic Summarization As Means Of Simplifying Texts, An Evaluation For Swedish. In Proceedings of the 18th Nordic Conference of Computational Linguistics (NoDaLiDa-2010), Riga, Latvia.
Wei, Xing, Fuchun Peng, Huihsin Tseng, Yumao Lu, and Benoit Dumoulin. 2009. Context sensitive synonym discovery for web search queries. Proceeding of the 18th ACM conference on Information and knowledge management CIKM 09 page 1585.
Blake, Catherine, Julia Kampov, Andreas K Orphanides, David West, and Cory Lown. 2007. UNC-CH at DUC 2007: Query Expansion, Lexical Simplication and Sentence Selection Strategies for Multi-Document Summarization. Proceedings of Document Understanding Conference (DUC) Workshop 2007.
Bolshakov, Igor A. and Alexander Gelbukh. 2004. Synonymous Paraphrasing Using Word-Net and Internet. Natural Language Processing and Information Systems pages 189–200.
Borin, Lars and Marcus Forsberg. 2009. All in the family: A comparison of SALDO and WordNet. In Proceedings of the Nodalida 2009 Workshop on WordNets and other Lexical Semantic Resources - between Lexical Semantics, Lexicography, Terminology and Formal Ontologies. NEALT Proceedings Series.
Carroll, John, Guido Minnen, Yvonne Canning, Siobhan Devlin, and John Tait. 1998. Practical simplication of English newspaper text to assist aphasic readers. In Proceedings of the AAAI98 Workshop on Integrating Articial Intelligence and Assistive Technology, vol. 1, pages 7–10. Citeseer.
Carroll, John, Guido Minnen, Darren Pearce, Yvonne Canning, Siobhan Devlin, and John Tait. 1999. Simplifying Text for Language-Impaired Readers. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 269–270.
Delsing, Lars-Olof and Katarina Lundin-Åkesson. 2005. Håller språket ihop Norden? : en forskningsrapport om ungdomars förståelse av danska, svenska och norska. TemaNord. Nordiska ministerrådet.
Domeij, Rickard, Ola Knutsson, Johan Carlberger, and Viggo Kann. 2000. Granska - An ecient hybrid system for Swedish grammar checking. In Proceedings of the 12th Nordic Conference in Computational Linguistics, Nodalida-99.
Falkenjack, Johan and Katarina Heimann Mühlenbock. 2012. Using the probability of readability to order Swedish texts. In Proceedings of the Fourth Swedish Language Technology Conference, Lund, Sweden.
Kandula, Sasikiran, Dorothy Curtis, and Qing Zeng-Treitler. 2010. A Semantic and Syntactic Text Simplication Tool for Health Content. AMIA Annual Symposium proceedings AMIA Symposium AMIA Symposium pages 366–370.
Kann, Viggo and Magnus Rosell. 2005. Free Construction of a Free Swedish Dictionary of Synonyms. In NoDaLiDa 2005 , pages 1–6. QC 20100806.
Keskisärkkä, Robin. 2012. Automatic Text Simplication via Synonym Replacement. Master's thesis, Linköping University, Department of Computer and Information Science.
Lal, Patha and Stefan Rüger. 2002. Extract-based summarization with simplication. In Proceedings of the ACL.
Miller, George A. 1995. WordNet: A Lexical Database for English. Communications of the ACM 38:39–41. DOI: 10.1145/219717.219748
Mühlenbock, Katarina and Soe Johansson Kokkinakis. 2009. LIX 68 revisited – An extended readability measure. In Proceedings of Corpus Linguistics.
Siddharthan, Advaith and Ann Copestake. 2002. Generating Anaphora for Simplifying Text. In Proceedings of the 4th Discourse Anaphora and Anaphor Resolution Colloquium DAARC 2002 , pages 199–204.
Smith, Christian and Arne Jönsson. 2011. Automatic Summarization As Means Of Simplifying Texts, An Evaluation For Swedish. In Proceedings of the 18th Nordic Conference of Computational Linguistics (NoDaLiDa-2010), Riga, Latvia.
Wei, Xing, Fuchun Peng, Huihsin Tseng, Yumao Lu, and Benoit Dumoulin. 2009. Context sensitive synonym discovery for web search queries. Proceeding of the 18th ACM conference on Information and knowledge management CIKM 09 page 1585.