ASCII key collection: 10 random PubMed documents with all text ASCII Original source collection.xml source: PubMed date: yyyymmdd. Date documents downloaded from PubMed document: Title and possibly abstract from a PubMed reference id: PubMed id passage: Either title or abstract infon["type"]: "title" or "abstract" offset: The original Unicode byte offsets were not updated after the ASCII conversion. PubMed is extracted from an XML file, so literal offsets would not be useful. Title has an offset of zero, while the abstract is assumed to begin after the title and one space. These offsets at least sequence the abstract after the title. text: The original Unicode text converted to ASCII using the NCBI IRET indexing Unicode to ASCII conversion table