README in snp-search-2.2.0 vs README in snp-search-2.3.0

- old
+ new

@@ -1,105 +1 @@ -= snp-search - -SNPsearch is a tool that manages SNP data and allows for data importing, manipulating, editing and complex querying of SNP data. It can be used to evaluate the utility of SNPs for the assessment of genetic diversity between haploid strains and the management of genotype and phenotype data. Once the database is created, the user is provided with several query and output options. SNPsearch is particularly useful in the analysis of phylogenetic trees that are based on SNP differences across whole core genomes. Queries can be made to answer critical genomic questions such as the association of SNPs with particular phenotypes. - -== Obtaining and installing the code -SNPsearch is written in Ruby and operates in a Unix environment. It is made available as a gem. See the github site for more information (https://github.com/hpa-bioinformatics/snp-search). - -To install snp-search, do - gem install snp-search - -== Requirements - -Not much, you just need: - -* Unix. Once snp-search is installed, all the necessary gems to run snp-search will also be installed from Rubygems (note that Rubygems requires admin privileges. If you do not have admin privileges then we suggest you install RVM: (http://beginrescueend.com/rvm/install/) and then gem install snp-search). -* ruby version 1.8.7 and above. - -* Optional: FastTree. If you require a tree output in Newick format, you must install FastTree from http://www.microbesonline.org/fasttree/#Install. You must specify the path of the executable in your .bashrc or .profile file as snp-search will run the command as just 'FastTree' and will not know where FastTree is if it is not specified in your .bashrc or .profile file. - -Thats it! - -== Running snp-search - -1- Creating the database (snp-search -create) - - Two files are needed to create the SQLite3 database: - - 1- Variant Call Format (.vcf) file (which contains the SNP information) - - 2- Your database reference genome that you used to generate your .vcf file (in genbank or embl format, the script will automatically detect the format). - -You need the following parameters: - - -n Name of your database - -v .vcf file - -d Database Reference genome (The same file that was used in generating the .vcf file). This should be in genbank or embl format. - - Other options: - -c SNP quality score cutoff. A Phred-scaled quality score. High quality scores indicate high confidence calls. Optional, default = 90 (out of 100) - -g Genotype Quality score cutoff. Phred-scaled quality score that the genotype is true. Optional, default = 30 - -h help message - - Usage: - snp-search -create -n my_snp_db.sqlite3 -d my_ref.gbk -v my_vcf_file.vcf - - Note: The strain names in your database will be taken from your vcf file so make sure they are named appropriately in your vcf file. - -2- Querying the Database (snp-search -query) - - Two queries are currently scripted in SNPsearch: - - 1- unique_snps: This option queries the database and selects the number of unique SNPs within the list of the strains/samples provided. The output is the number of unique SNPs. - - You need the following parameters: - - -n Name of your database - -s The strains/samples you like to query - - Usage: - snp-search -n my_snp_db.sqlite3 -s list_of_my_strains.txt - - 2- not_include_snps_from_gene: This option queries the database to select only those SNPs not found in a specified gene. These SNPs are used to make a concatenated SNP multiple alignment file (FASTA format). This is a way of removing a set of genes (likely to be mobile element genes) that are not needed for SNP analysis. The user has the option of generating a core SNP tree Newick file for SNP phylogeny. - - You need the following parameters: - - -n Name of your database - -a The gene you like to remove from analysis - -o Output file, in fasta format - - options: - -t Generate SNP phylogeny - -w Output tree in Newick format - - Usage (phage is used as the example gene): - snp-search -n my_snp_db.sqlite3 -a phage -o snps_sequences_without_phage.fasta -t -w snps_sequences_without_phage.nwk - - The algorithm FastTree is used to generate the nwk file. FastTree can be downloaded from http://www.microbesonline.org/fasttree/#Install (see above) - - 3- Output database (snp-search -out_file) - - You need the following parameters: - - -n Name of your database - -o Output file containing the database in fasta format - -== View database in Unix or in a GUI -Your database will be in sqlite3 format. If you like to view your table(s) and perform direct queries you can type - sqlite3 snp_db.sqlite3 - -Alternatively, you may download a SQL tool to view your database (e.g. SQLite sorcerer). - -== Contact - -If you have any comments, questions or suggestions, please email - ali.al-shahib@hpa.org.uk -or - anthony.underwood@hpa.org.uk - -Have fun snp-searching! - -== Copyright - -Copyright (c) 2012 Ali Al-Shahib. See LICENSE.txt for -further details. - \ No newline at end of file