Benchmark for Evaluation of Danish Clinical Word Embeddings

Martin Sundahl Laursen; Jannik Skyttegaard Pedersen; Pernille Just Vinholt; Rasmus Søgaard Hansen; Thiusius Rajeeth Savarimuthu

doi:10.3384/nejlt.2000-1533.2023.4132

Authors

Martin Sundahl Laursen University of Southern Denmark https://orcid.org/0000-0001-5684-1325
Jannik Skyttegaard Pedersen University of Southern Denmark https://orcid.org/0000-0002-7066-1563
Pernille Just Vinholt Odense University Hospital
Rasmus Søgaard Hansen Odense University Hospital
Thiusius Rajeeth Savarimuthu University of Southern Denmark

DOI:

https://doi.org/10.3384/nejlt.2000-1533.2023.4132

Abstract

In natural language processing, benchmarks are used to track progress and identify useful models. Currently, no benchmark for Danish clinical word embeddings exists. This paper describes the development of a Danish benchmark for clinical word embeddings. The clinical benchmark consists of ten datasets: eight intrinsic and two extrinsic. Moreover, we evaluate word embeddings trained on text from the clinical domain, general practitioner domain and general domain on the established benchmark. All the intrinsic tasks of the benchmark are publicly available.

Benchmark for Evaluation of Danish Clinical Word Embeddings

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

Make a Submission