= snp-search snp-search is a set of tools that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data. It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data. Once a query is performed, SNPsearch can be used to convert the selected SNP data into FASTA sequences. SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes. Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes. == Obtaining and installing the code snp-search is written in Ruby and operates in a Unix environment. It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search). To install snp-search, do gem install snp-search == Requirements Not much, you just need: * Unix. When installed it will install all the necessary gems for you from Rubygems (note that Rubygems requires admin privileges. If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search). * ruby version 1.8.7 and above. Thats it! == Running snp-search To run snp-search, you need to have 2 files: 1- Variant Call Format (.vcf) file (which contains the SNP information) 2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format). Once you have these files ready, you may run snp-search with the following options: -V Enable verbose mode -n Name of your database -v .vcf file Required -d Database Reference genome (The same file that was used in generating the .vcf file). This should be in genbank or embl format. Required -c SNP quality score cutoff. A Phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 (out of 100) -t Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true. Optional, default = 30 -h help message Usage: snp-search -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf == Output The output is your database in sqlite3 format. If you like to view your table(s) and perform queries you can type sqlite3 snp_db.sqlite3 Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer) == Examples We have included two example queries that you may find useful: * Example1: This script queries the database to select only those SNPs not found in phage related genes. These SNPs were used to make a concatenated SNP multiple alignment file (FASTA format). This is a way of removing a set of genes that are not needed for the SNP analysis. You may use this script to do other SQL queries that result in a FASTA output. Usage: ruby example1.rb -D your_db_name.sqlite3 -s list_of_your_species.txt -o output.fasta * Example2: This script queries the database and selects the number of unique SNPs within the list of the strains/samples provided. The output is the number of unique SNPs. Usage: ruby example2.rb -D your_db_name.sqlite3 -s list_of_your_species.txt == Contact If you have any comments, questions or suggestions, please email ali.al-shahib@hpa.org.uk or anthony.underwood@hpa.org.uk Have fun snp-searching! == Copyright Copyright (c) 2011 Ali Al-Shahib. See LICENSE.txt for further details.