README.rdoc in bio-gff3-0.8.3 vs README.rdoc in bio-gff3-0.8.4
- old
+ new
@@ -56,5 +56,93 @@
== Copyright
Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl>
+
+ Fetch and assemble GFF3 types (e.g. ORF, mRNA, CDS) + print in FASTA format.
+
+ gff3-fetch [--low-mem] [--validate] type [filename.fa] filename.gff3
+
+ Where (NYI == Not Yet Implemented):
+
+ --translate : output as amino acid sequence
+ --validate : validate GFF3 file by translating
+ --fix : check 3-frame translation and fix, if possible
+ --fix-wormbase : fix 3-frame translation on ORFs named 'gene1'
+ --no-assemble : output each record as a sequence -- NYI
+ --add-phase : output records using phase (useful w. no-assemble CDS to AA) --NYI
+
+ type is any valid type in the GFF3 definition. For example:
+
+ mRNA : assemble mRNA
+ CDS : assemble CDS
+ exon : list all exons
+ gene|ORF : list gene ORFs
+ other : use any type from GFF3 definition, e.g. 'Terminate' -- NYI
+
+ and the following performance options:
+
+ --cache full : load all in RAM (fast)
+ --cache none : do not load anything in memory (slow)
+ --low-mem : use LRU cache (limit RAM use, fast) -- NYI
+ --max-cpus num : use num threads -- NYI
+ --emboss : use EMBOSS translation (fast) -- NYI
+
+ Multiple GFF3 files can be used. With external FASTA files, always the last
+ one before the GFF3 filename is matched.
+
+ Note that above switches are only partially implemented at this stage. Full
+ feature support is projected Feb. 2011.
+
+ Examples:
+
+ Assemble mRNA and CDS information from test.gff3 (which includes sequence information)
+
+ gff3-fetch mRNA test/data/gff/test.gff3
+ gff3-fetch CDS test/data/gff/test.gff3
+
+ Find CDS records from external FASTA file, adding phase and translate to protein sequence
+
+ gff3-fetch --no-assemble --add-phase --translate CDS test/data/gff/MhA1_Contig1133.fa test/data/gff/MhA1_Contig1133.gff3
+
+ Find mRNA from external FASTA file, without loading everything in RAM
+
+ gff3-fetch --cache none mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3
+ gff3-fetch --cache none mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3
+
+ Validate GFF3 file using EMBOSS translation and validation
+
+ gff3-fetch --cache none --validate --emboss mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3
+
+ Find GENEID predicted terminal exons
+
+ gff3-fetch terminal chromosome1.fa geneid.gff3
+
+== Performance
+
+time gff3-fetch cds m_hapla.WS217.dna.fa m_hapla.WS217.gff3 > test.fa
+
+ Cache real user sys
+ ----------------------------------------------------
+ full 12m41s 12m28s 0m09s (0.8.0 Jan. 2011)
+ none 504m39s 477m49s 26m50s (0.8.0 Jan. 2011)
+ ----------------------------------------------------
+
+where
+
+ 52M m_hapla.WS217.dna.fa
+ 456M m_hapla.WS217.gff3
+
+ruby 1.9.2p136 (2010-12-25 revision 30365) [x86_64-linux]
+on an 8 CPU, 2.6 GHz (6MB cache), 16 GB RAM machine.
+
+== Cite
+
+ If you use this software, please cite
+
+ http://dx.doi.org/10.1093/bioinformatics/btq475
+
+== Copyright
+
+Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl>
+