Identify problems with predicted genes
This is a GSoC 2013 project.
Details about the project's progress during the Coding period can be found here.
We also have a blog.
Please note that some of the functionalities
of this tool are still under development.
So, stay tunned!
of this tool are still under development.
So, stay tunned!
Authors
- GSoC student: Monica Dragan (email)
- Mentors: Anurag Priyam(email) and Yannick Wurm(email)
Abstract
The goal of GeneValidator is to identify problems with gene predictions and provide useful information based on the similarities to genes in public databases.The results of the prediction validation will make evidence about how the sequencing curation may be done and can be useful in improving / trying new approaches for gene prediction tools. The main target users of this tool are the Biologists who want to validate the data obtained in their own laboratories.
Current Validations
- Length validation by clusterization
- Length validation by ranking
- Reading frame validation
- Check gene merge
- Check duplications
- Main ORF validation (for nucleotides)
- Validation based on multiple alignment ~ under development
- Codon coverage ~ under development
Requirements
- Ruby (>= 1.9.3)
- R (>= 2.14.2)
- RubyGems (>= 1.3.6)
- NCBI BLAST+ (>= 2.2.25+)
- MAFFT installation (download it from : http://mafft.cbrc.jp/alignment/software/ ).
Linux and MacOS are officially supported!
Installation
Get the source code
$ git clone git@github.com:monicadragan/gene_prediction.git
Be sudo and build the gem
$ sudo rake
Run GeneValidation
$ genevalidator [validations] [skip_blast] [start] [tabular] [mafft] [raw_seq] FILE
Example that emphasizes all the validations:
$ genevalidator -x data/all_validations_prot/all_validations_prot.xml data/all_validations_prot/all_validations_prot.fasta
Learn more:
$ genevalidator -h
Outputs
By running GeneValidator on your dataset you get numbers and plots. Some relevant files will be generated at the same path with the input file. The results are available in 3 formats:
* console table output
* validation results in YAML format (the YAML file has the same name with the input file + YAML extension)
* html output with plot visualization (the useful files will be generated in the 'html' directory, at the same path with the input file)
! Note: for the moment check the html output with Firefox browser only !
Other things
Run unit tests
$ rake test
Generate documentation
$ rake doc