= snp-search snp-search is a set of tools that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data. It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data. Once a query is performed, SNPsearch can be used to convert the selected SNP data into FASTA sequences. SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes. Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes. == Obtaining and installing the code snp-search is written in Ruby and operates in a Unix environment. It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search). To install snp-search, do gem install snp-search == Requirements Nothing! You just need to run this in Unix and it will install all the necessary gems for you from Rubygems (note that Rubygems requires admin privileges. If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search) == Running snp-search To run snp-search, you need to have 3 files: 1- Variant Call Format (.vcf) file (which contains the SNP information) 2- Your reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format). 3- A text file with a list of your strain/sample names. These should be the same strains/samples used in generating the .vcf file. In the text file, every strain/sample name should have a new line, e.g. strain1 strain2 strain3 strain4 etc.. Once you have these files ready, you may run snp-search with the following options: -V Enable verbose mode -n Name of your database Optional, default = snp_db.sqlite3 -v .vcf file Required -r Reference genome file (The same file that was used in generating the .vcf file). This should be in genbank or embl format. Required -s Text file that contains a list of the strain/sample names (The same strains/samples used in generating the .vcf file) Required -c SNP quality cutoff. A phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 -t Genotype Quality cutoff. This is the probability that the genotype call is wrong under the condition that the site is being variant. Optional, default = 30 -h help message Usage: snp-search -n my_snp_db.sqlite3 -r my_ref.gbk -v my_vcf_file.vcf -s my_list_of_strains.txt == Output The output is your database in sqlite3 format. If you like to view your table(s) and perform queries you can type sqlite3 snp_db.sqlite3 Alternatively, you may download a SQL tool to see a GUI of your database (e.g. SQLite sorcerer) Have fun snp-searching! == Copyright Copyright (c) 2011 Ali Al-Shahib. See LICENSE.txt for further details.