= snp-search

an easy to use tool for management of SNPs generated from haploid next generation sequencing data. Given a vcf file, snp-search stores the SNPs generated by the variant calling algorithm into a sqlite database. snp-search can then be used to extract useful information from the database.

== Obtaining and installing the code
SNPsearch is written in Ruby and operates in a Unix environment.  It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search).

To install snp-search, do
  gem install snp-search

== Requirements

Not much, you just need:

* Unix. Once snp-search is installed, all the necessary gems to run snp-search will also be installed from Rubygems (note that Rubygems requires admin privileges.  If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search).  

* ruby version 1.8.7 and above.

* Optional: FastTree 2.  If you require a tree output in Newick format, you must install FastTree from http://www.microbesonline.org/fasttree/#Install.  

Thats it!

== Running snp-search   

1- The first thing you need to do is to create the database (snp-search -create)

  Two files are needed to create the SQLite3 database:

  1A- Variant Call Format (.vcf) file (which contains the SNP information)

  1B- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format).

  You need the following parameters:

  -d	Name of your database (note that this is a required field in all commands).
  -v	.vcf file	
  -r	Database Reference genome (The same file that was used in generating the .vcf file).  This should be in genbank or embl format.

  Optional: -A  AD ratio cutoff (default 0.9)

  Usage:
    snp-search -create -d my_snp_db.sqlite3 -r my_ref.gbk -v my_vcf_file.vcf 

  Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file.

2- Now that you have created the database (my_snp_db.sqlite3) you can use snp-search to output several queried data.

  First, you need to tell snp-search what you want out.  You have several options:
  - Querying the Database to select the number of unique SNPs within the list of the strains/samples provided (list_of_my_strains.txt). The output is a text file with a list of the unique SNPs and information about each SNP (e.g. if its synonymous or non-synonymous SNP).  

    -output -unique_snps -d db.sqlite3 [options]
      -u, --unique_snps                      Query for unique snps in the database
      -c, --cuttoff_snp_qual                 SNP quality cutoff, (default = 90)
      -g, --cuttoff_genotype                 Genotype quality cutoff (default = 30)
      -s, --strain                           The strains/samples you like to query (only used with -unique_snps flag)
      -o, --out                              Name of output file, Required
         
    Usage: 
    snp-search -O -u -d my_snp_db.sqlite3 -s list_of_my_strains.txt -o unique_snps.out

  - Querying the database to output all SNPs without SNPs in a specified features in the database (e.g. phages).  This is a way of ignoring SNPs in genes (likely to be mobile element genes) that are not needed for SNP analysis.  The user has the option of generating a core SNP tree Newick file for SNP phylogeny (if -F option was used to ouput fasta file).  

  -output -all_or_filtered_snps -d db.sqlite3 [options]
    -f, --all_or_filtered_snps             SNPs from specified features in the database (if you do not want to ignore any SNPs, just use this option with -n -F/T -o)
    -F, --fasta                            output fasta file format (default)
    -T, --tabular                          output tabular file format
    -c, --cuttoff_snp_qual                 SNP quality cutoff, (default = 90)
    -g, --cuttoff_genotype                 Genotype quality cutoff (default = 30)
    -R, --remove_non_informative_snps      Only output informative SNPs. Only used with -e option
    -e, --ignore_snps_in_range             A list of position ranges to ignore e.g 10..500,2000..2500. Only used with -e option
    -a, --ignore_strains                   A list of strains to ignore (seperate by comma e.g. S1,S4,S8 ). Only used with -f option
    -I, --ignore_snps_on_annotation        The name of the feature(s) to ignore.  Features should be seperated by comma (e.g. phages,inserstion,transposons)
    -o, --out                              Name of output file, Required
    -t, --tree                             Generate SNP phylogeny (only used with -fasta option)
    -p, --fasttree_path                    Full path to the FastTree tool (e.g. /usr/local/bin/FastTree. only used with -tree option)

  Usage:
  snp-search -O -F -f -n my_snp_db.sqlite3 -a phage,insertion,transposon -R -o snps_without_phages.fasta

  - Optionally, you can add the following options to generate a phylogenetic tree from the resulting fasta file:
      
  -t  Generate SNP phylogeny
  -p  Full path to the FastTree tool (e.g. /usr/local/bin/FastTree. only used with -tree option)
  Usage:
  snp-search -O -F -e -n my_snp_db.sqlite3 -a phage,insertion,transposon -r -t -p /usr/local/bin/FastTree -o snps_without_phages.fasta
 
  The algorithm FastTree is used to generate the nwk file.  FastTree can be downloaded from http://www.microbesonline.org/fasttree/#Install (see above)

  - Output all SNPs with information.  Information for each SNP includes whether the SNP is synonymous or non-synonymous, gene function, whether it is a pseudogene and other useful information.  These information will be tab-seperated. 

  -output -info -d db.sqlite3 [options]
    -i, --info                             Output various information about SNPs
    -c, --cuttoff_snp_qual                 SNP quality cutoff, (default = 90)
    -g, --cuttoff_genotype                 Genotype quality cutoff (default = 30)
    -o, --out                              Name of output file, Required
     
  Usage:
  snp-search -O -info -d my_snp_db.sqlite3 -o snps_all_with_info.txt

== View database in Unix or in a GUI 
Your database will be in sqlite3 format.  If you like to view your table(s) and perform direct queries you can type 
  sqlite3 snp_db.sqlite3

Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer).

== Contact

If you have any comments, questions or suggestions, please email
  ali.al-shahib@phe.gov.uk
or
  anthony.underwood@phe.gov.uk

Have fun snp-searching!

== Copyright

Copyright (c) 2012 Ali Al-Shahib. See LICENSE.txt for
further details.