= snp-search SNPsearch is a tool that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data. It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data. Once a query is performed, SNPsearch can be used to convert the selected SNP data into FASTA sequences. SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes. Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes. == Obtaining and installing the code SNPsearch is written in Ruby and operates in a Unix environment. It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search). To install snp-search, do gem install snp-search == Requirements Not much, you just need: * Unix. Once snp-search is installed, all the necessary gems to run snp-search will also be installed from Rubygems (note that Rubygems requires admin privileges. If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search). * ruby version 1.8.7 and above. Thats it! == Running snp-search To run snp-search, you need two files: 1- Variant Call Format (.vcf) file (which contains the SNP information) 2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format). Once you have these files ready, you may run snp-search with the following options: -V Enable verbose mode -n Name of your database -v .vcf file Required -d Database Reference genome (The same file that was used in generating the .vcf file). This should be in genbank or embl format. Required -c SNP quality score cutoff. A Phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 (out of 100) -t Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true. Optional, default = 30 -h help message Usage: snp-search -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf == Output The output is your database in sqlite3 format. If you like to view your table(s) and perform queries you can type sqlite3 snp_db.sqlite3 Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer). Also, depending on the query, a concatenated SNP FASTA file may be outputed (see below). == Examples We have included two example queries that you may find useful: * Example1: This script queries the database to select only those SNPs not found in phage related genes. These SNPs were used to make a concatenated SNP multiple alignment file (FASTA format). This is a way of removing a set of genes that are not needed for the SNP analysis. You may use this script to do other SQL queries that result in a FASTA output. Usage: ruby example1.rb -D your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta options: -V, Enable verbose mode -D, The name of the database you like to query, Required -o, output file, in fasta format -s, The strains/samples you like to query, Required -a, The gene you like to remove from analysis -h, Print this help message * Example2: This script queries the database and selects the number of unique SNPs within the list of the strains/samples provided. The output is the number of unique SNPs. Usage: ruby example2.rb -D your_db_name.sqlite3 -s list_of_your_species.txt options: -V, Enable verbose mode -D, The name of the database you like to query, Required -s, The strains/samples you like to query, Required -h, Print this help message == Contact If you have any comments, questions or suggestions, please email ali.al-shahib@hpa.org.uk or anthony.underwood@hpa.org.uk Have fun snp-searching! == Copyright Copyright (c) 2012 Ali Al-Shahib. See LICENSE.txt for further details.