Sha256: 863419eaba185d45107ea8019d446bbcec748433f3476f4c91cbeb1baaae0482

Contents?: true

Size: 1.75 KB

Versions: 24

Compression:

Stored size: 1.75 KB

Contents

lemma key

    collection:  10 random PubMed documents with ASCII text split into
                 sentences and tokens by the MedPost tokenizer
                 
                 Original source sentence.xml

    source:  PubMed

    date:  yyyymmdd. Date documents downloaded from PubMed

    document:  Title and possibly abstract from a PubMed reference

    id:  PubMed id

    passage:  Either title or abstract

    infon["type"]:  "title" or "abstract"

    offset: The original Unicode byte offsets were not updated after
            the ASCII conversion.

            PubMed is extracted from an XML file, so literal offsets
            would not be useful. Title has an offset of zero, while
            the abstract is assumed to begin after the title and one
            space. These offsets at least sequence the abstract after
            the title.

    sentence:  One sentence of the passage as determined by the
               MedPost sentence splitter

    offset: A document offset to where the sentence begins in the
            passage. Sum of the passage offset and the local offset
            within the passage.

    annotation:  tokens in the sentence with their part-of-speech.
                 the annotations are of "type" "token"

    infon["POS"]:  The Penn Treebank part of speech tag as determined
                   by the MedPost biomedical part-of-speech tagger

    infon["lemma"]:  The tokens lemma as determined by BioLemmatizer

    location:  offset: A document offset to where the annotated text
                       begins in the sentence. Sum of the sentence
                       offset and the local offset within the
                       sentence.

               length: The length of the token.

    text:  ASCII text of the token.

Version data entries

24 entries across 24 versions & 1 rubygems

Version Path
simple_bioc-0.0.24 xml/lemma.key
simple_bioc-0.0.23 xml/lemma.key
simple_bioc-0.0.22 xml/lemma.key
simple_bioc-0.0.21 xml/lemma.key
simple_bioc-0.0.20 xml/lemma.key
simple_bioc-0.0.19 xml/lemma.key
simple_bioc-0.0.18 xml/lemma.key
simple_bioc-0.0.17 xml/lemma.key
simple_bioc-0.0.16 xml/lemma.key
simple_bioc-0.0.15 xml/lemma.key
simple_bioc-0.0.14 xml/lemma.key
simple_bioc-0.0.13 xml/lemma.key
simple_bioc-0.0.12 xml/lemma.key
simple_bioc-0.0.11 xml/lemma.key
simple_bioc-0.0.10 xml/lemma.key
simple_bioc-0.0.9 xml/lemma.key
simple_bioc-0.0.8 xml/lemma.key
simple_bioc-0.0.7 xml/lemma.key
simple_bioc-0.0.6 xml/lemma.key
simple_bioc-0.0.5 xml/lemma.key