Sha256: eecc000369737983ed792886c32a4145072495ca9816741071d9759b71656ab0
Contents?: true
Size: 1.14 KB
Versions: 24
Compression:
Stored size: 1.14 KB
Contents
sentence key collection: 10 random PubMed documents with ASCII text split into sentences by the MedPost sentence splitter Original source ascii.xml source: PubMed date: yyyymmdd. Date documents downloaded from PubMed document: Title and possibly abstract from a PubMed reference id: PubMed id passage: Either title or abstract type: "title" or "abstract" offset: The original Unicode byte offsets were not updated after the ASCII conversion. PubMed is extracted from an XML file, so literal offsets would not be useful. Title has an offset of zero, while the abstract is assumed to begin after the title and one space. These offsets at least sequence the abstract after the title. sentence: One sentence of the passage as determined by the MedPost sentence splitter offset: A document offset to where the sentence begins in the passage. The sum of the passage offset and the local offset within the passage. text: ASCII text of the setence.
Version data entries
24 entries across 24 versions & 1 rubygems