Extends the methods of the Bio::DB::Vcf class in bio-samtools. A Vcf object represents the VCF format described at www.1000genomes.org/node/101 . The Bio::DB::Vcf object returns all information in the VCF line, but the implementation here acts as if there is only possibly one variant at each position and ignores positions at which there may be multiple variants. Vcf format is only used when the Bio::Util::Gngm object requests information about indels using SAMtools mpileup method.
List of alternate alleles at this locus, obtained by splitting the vcf.alt attribute string on commas
Example vcf.alt = “ACT,TCA”
vcf.alternatives = ["ACT", "TCA"]
vcf.alt = “T”
vcf.alternatives = ["T"]
# File lib/bio/util/bio-gngm.rb, line 101 def alternatives self.alt.split(",") rescue [] end
Returns true if only one variant allele is recorded. Loci with more than one allele are too complicated for now, so are discarded…
# File lib/bio/util/bio-gngm.rb, line 134 def has_just_one_variant? self.alternatives.length == 1 and self.variant? end
Returns true if the length of the alt column is less than that of the ref column in the Vcf and if #pass_quality? is true. Looks only at the positions that are predicted simple deletions, any positions where the alt alleles includes more than one deletion or a SNP or an insertion also is ignored.
# File lib/bio/util/bio-gngm.rb, line 153 def is_deletion?(options) case when (not self.has_just_one_variant?) then false when ( self.alt.length < self.ref.length and self.pass_quality?(options) ) then true else false end rescue ## if something goes wrong, skip the postion, false end
Returns true if either #is_insertion? or #is_deletion? is true
# File lib/bio/util/bio-gngm.rb, line 176 def is_indel?(opts) self.is_insertion?(opts) || self.is_deletion?(opts) end
Returns true if the length of the alt column is greater than that of the ref column in the Vcf and if #pass_quality? is true. Looks only at the positions that are predicted simple deletions, any positions where the alt alleles includes more than one deletion or a SNP or an insertion also is ignored.
# File lib/bio/util/bio-gngm.rb, line 165 def is_insertion?(options) case when (not self.has_just_one_variant?) then false when ( self.alt.length > self.ref.length and self.pass_quality?(options) ) then true else false end rescue ## if something goes wrong, skip the postion, false end
Returns the mean Mapping Quality from the reads over this position as defined by the Vcf MQ attribute.
# File lib/bio/util/bio-gngm.rb, line 119 def mq self.info["MQ"].to_f rescue 0.0 end
Returns the depth of reads containing the non reference allele. IE the sum of the last two figures in the DP4 attribute.
# File lib/bio/util/bio-gngm.rb, line 106 def non_ref_allele_count self.info["DP4"].split(",")[2..3].inject {|sum,n| sum.to_f + n.to_f } rescue 0.0 end
Returns the non-reference allele frequency based on depth of reads used for the genotype call,
IE
vcf.non_ref_allele_count / vcf.used_depth
# File lib/bio/util/bio-gngm.rb, line 114 def non_ref_allele_freq self.non_ref_allele_count / self.used_depth end
Returns true if the position passes criteria
Options and Defaults:
:min_depth => 2
:min_non_ref_count => 2
:mapping_quality => 10
Example
vcf.pass_quality?(:min_depth => 5, :min_non_ref_count => 2, :mapping_quality => 25)
# File lib/bio/util/bio-gngm.rb, line 147 def pass_quality?(options) (self.used_depth >= options[:min_depth] and self.mq >= options[:mapping_quality] and self.non_ref_allele_count >= options[:min_non_ref_count]) end
Return a short string representing chromosome, position, reference sequence, alt sequence(s) and the info string of the Vcf object.
# File lib/bio/util/bio-gngm.rb, line 85 def to_s "#{self.chrom} #{self.pos} #{self.ref} #{self.alt} #{self.info}" end
The depth of reads actually used in the genotype call by Vcftools. The sum of the DP4 attribute. Returns 0.0 if no value is calculated.
# File lib/bio/util/bio-gngm.rb, line 90 def used_depth self.info["DP4"].split(",").inject {|sum,n| sum.to_f + n.to_f} rescue 0.0 end
returns true if the alt
column of the Vcf is not .
Examples
vcf record = 20 14370 rs6054257 G A 29 PASS …
vcf.variant? #=> true
vcf record = 20 1230237 . T . 47 PASS …
vcf.variant? #=> false
# File lib/bio/util/bio-gngm.rb, line 80 def variant? not self.alt == "." rescue false end