Tangherlini, Timothy
0000-0002-1775-2052
Crist, Sean
Broadwell, Peter M.
Gabriel, David
Urban, Kryztof
Vijunas, Aurelijus
Crawford, Jackson
IceMorph morphological analysis data files
University of California, Los Angeles
2014
Old Icelandic
Morphosyntactic tagging
POS-tagging
Old Icelandic dictionaries
Old Icelandic training data
These data are covered by a Creative Commons CC0 license.
This dataset consists of four main resources: a concatenated dictionary of Old Icelandic parsed for word class and inflectional detail; a corpus of Old Icelandic sagas in plain text and chunked by chapter; a tagged version of the same text, output of the IceMorph system; a training corpus labeled "Expert" for training and testing a machine learning module; and a training corpus labeled "Gold" for training and testing a machine learning module.
Datasets (1) dictionary (2a) saga texts were generated using OCR. Dataset (2b) is the output of the IceMorph tagging system. Datasets (3a) and (3b) were generated by hand-tagging.