Sha256: 501e971f97d1aebe08bdd2f51c728591e4a4f210a0bf9b94af0adc2d138df577

Contents?: true

Size: 926 Bytes

Versions: 24

Compression:

Stored size: 926 Bytes

Contents

ASCII key

    collection:  10 random PubMed documents with all text ASCII
                 
                 Original source collection.xml

    source:  PubMed

    date:  yyyymmdd. Date documents downloaded from PubMed

    document:  Title and possibly abstract from a PubMed reference

    id:  PubMed id

    passage:  Either title or abstract

    infon["type"]:  "title" or "abstract"

    offset: The original Unicode byte offsets were not updated after
            the ASCII conversion.

            PubMed is extracted from an XML file, so literal offsets
            would not be useful. Title has an offset of zero, while
            the abstract is assumed to begin after the title and one
            space. These offsets at least sequence the abstract after
            the title.

    text: The original Unicode text converted to ASCII using the NCBI
          IRET indexing Unicode to ASCII conversion table

Version data entries

24 entries across 24 versions & 1 rubygems

Version Path
simple_bioc-0.0.4 xml/ascii.key
simple_bioc-0.0.3 xml/ascii.key
simple_bioc-0.0.2 xml/ascii.key
simple_bioc-0.0.1 xml/ascii.key