= snp-search

snp-search is a set of tools that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data.  It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data.  Once a query is performed, SNPsearch can be used to convert the selected SNP data into FASTA sequences.  SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes.  Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes.

== Obtaining and installing the code
snp-search is written in Ruby and operates in a Unix environment.  It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search).

To install snp-search, do
  gem install snp-search

== Requirements

Not much, you just need:

* Unix. When installed it will install all the necessary gems for you from Rubygems (note that Rubygems requires admin privileges.  If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search).  
* ruby version 1.8.7 and above.

Thats it!

== Running snp-search   

To run snp-search, you need to have 2 files:

1- Variant Call Format (.vcf) file (which contains the SNP information)

2- Your reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).

Once you have these files ready, you may run snp-search with the following options:

  -V	Enable verbose mode
  -n	Name of your database	Optional, default = snp_db.sqlite3
  -v	.vcf file	Required
  -r	Reference genome file (The same file that was used in generating the .vcf file).  This should be in genbank or embl format.	Required
  -c	SNP quality score cutoff.  A Phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 (out of 100)
  -t	Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true.	Optional, default = 30
  -h	help message

Usage:
  snp-search -n my_snp_db.sqlite3 -r my_ref.gbk -v my_vcf_file.vcf 

== Output
The output is your database in sqlite3 format.  If you like to view your table(s) and perform queries you can type 
  sqlite3 snp_db.sqlite3

Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer)

== Examples

We have included two example queries that you may find useful:

* Example1: This script queries the database to select only those SNPs not found in phage related genes. These SNPs were used to make a concatenated SNP multiple alignment file (FASTA format).  This is a way of removing a set of genes that are not needed for the SNP analysis. You may use this script to do other SQL queries that result in a FASTA output.

Usage:

  ruby example1.rb -d your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta

* Example2: This script queries the database and selects the number of unique SNPs within the list of the strains/samples provided.  The output is the number of unique SNPs.

Usage:

  ruby example2.rb -d your_db_name.sqlite3 -s list_of_your_species.txt 


== Contact

If you have any comments, questions or suggestions, please email
  ali.al-shahib@hpa.org.uk
or
  anthony.underwood@hpa.org.uk

Have fun snp-searching!

== Copyright

Copyright (c) 2011 Ali Al-Shahib. See LICENSE.txt for
further details.