README.rdoc in bio-gff3-0.8.3 vs README.rdoc in bio-gff3-0.8.4

- old
+ new

@@ -56,5 +56,93 @@ == Copyright Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl> + + Fetch and assemble GFF3 types (e.g. ORF, mRNA, CDS) + print in FASTA format. + + gff3-fetch [--low-mem] [--validate] type [filename.fa] filename.gff3 + + Where (NYI == Not Yet Implemented): + + --translate : output as amino acid sequence + --validate : validate GFF3 file by translating + --fix : check 3-frame translation and fix, if possible + --fix-wormbase : fix 3-frame translation on ORFs named 'gene1' + --no-assemble : output each record as a sequence -- NYI + --add-phase : output records using phase (useful w. no-assemble CDS to AA) --NYI + + type is any valid type in the GFF3 definition. For example: + + mRNA : assemble mRNA + CDS : assemble CDS + exon : list all exons + gene|ORF : list gene ORFs + other : use any type from GFF3 definition, e.g. 'Terminate' -- NYI + + and the following performance options: + + --cache full : load all in RAM (fast) + --cache none : do not load anything in memory (slow) + --low-mem : use LRU cache (limit RAM use, fast) -- NYI + --max-cpus num : use num threads -- NYI + --emboss : use EMBOSS translation (fast) -- NYI + + Multiple GFF3 files can be used. With external FASTA files, always the last + one before the GFF3 filename is matched. + + Note that above switches are only partially implemented at this stage. Full + feature support is projected Feb. 2011. + + Examples: + + Assemble mRNA and CDS information from test.gff3 (which includes sequence information) + + gff3-fetch mRNA test/data/gff/test.gff3 + gff3-fetch CDS test/data/gff/test.gff3 + + Find CDS records from external FASTA file, adding phase and translate to protein sequence + + gff3-fetch --no-assemble --add-phase --translate CDS test/data/gff/MhA1_Contig1133.fa test/data/gff/MhA1_Contig1133.gff3 + + Find mRNA from external FASTA file, without loading everything in RAM + + gff3-fetch --cache none mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3 + gff3-fetch --cache none mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3 + + Validate GFF3 file using EMBOSS translation and validation + + gff3-fetch --cache none --validate --emboss mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3 + + Find GENEID predicted terminal exons + + gff3-fetch terminal chromosome1.fa geneid.gff3 + +== Performance + +time gff3-fetch cds m_hapla.WS217.dna.fa m_hapla.WS217.gff3 > test.fa + + Cache real user sys + ---------------------------------------------------- + full 12m41s 12m28s 0m09s (0.8.0 Jan. 2011) + none 504m39s 477m49s 26m50s (0.8.0 Jan. 2011) + ---------------------------------------------------- + +where + + 52M m_hapla.WS217.dna.fa + 456M m_hapla.WS217.gff3 + +ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-linux] +on an 8 CPU, 2.6 GHz (6MB cache), 16 GB RAM machine. + +== Cite + + If you use this software, please cite + + http://dx.doi.org/10.1093/bioinformatics/btq475 + +== Copyright + +Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl> +