class Bio::DB::Vcf

Extends the methods of the Bio::DB::Vcf class in bio-samtools. A Vcf object represents the VCF format described at www.1000genomes.org/node/101 . The Bio::DB::Vcf object returns all information in the VCF line, but the implementation here acts as if there is only possibly one variant at each position and ignores positions at which there may be multiple variants. Vcf format is only used when the Bio::Util::Gngm object requests information about indels using SAMtools mpileup method.

Public Instance Methods

alternatives() click to toggle source

List of alternate alleles at this locus, obtained by splitting the vcf.alt attribute string on commas

Example vcf.alt = “ACT,TCA”

vcf.alternatives = ["ACT", "TCA"]

vcf.alt = “T”

vcf.alternatives = ["T"]
# File lib/bio/util/bio-gngm.rb, line 101
def alternatives
  self.alt.split(",") rescue []
end
gq() click to toggle source

Returns the genotype quality score from the sample data (as defined by the Vcf GQ attribute) for the first sample in the Vcf only.

# File lib/bio/util/bio-gngm.rb, line 124
def gq 
  self.samples["1"]["GQ"].to_f rescue 0.0
end
has_just_one_variant?() click to toggle source

Returns true if only one variant allele is recorded. Loci with more than one allele are too complicated for now, so are discarded…

# File lib/bio/util/bio-gngm.rb, line 134
def has_just_one_variant?
  self.alternatives.length == 1 and self.variant?
end
is_deletion?(options) click to toggle source

Returns true if the length of the alt column is less than that of the ref column in the Vcf and if #pass_quality? is true. Looks only at the positions that are predicted simple deletions, any positions where the alt alleles includes more than one deletion or a SNP or an insertion also is ignored.

# File lib/bio/util/bio-gngm.rb, line 153
def is_deletion?(options)
  case
  when (not self.has_just_one_variant?) then false
  when  ( self.alt.length < self.ref.length  and self.pass_quality?(options) )  then true
  else false
  end
rescue ## if something goes wrong, skip the postion, 
  false
end
is_indel?(opts) click to toggle source

Returns true if either #is_insertion? or #is_deletion? is true

# File lib/bio/util/bio-gngm.rb, line 176
def is_indel?(opts)
  self.is_insertion?(opts) || self.is_deletion?(opts)
end
is_insertion?(options) click to toggle source

Returns true if the length of the alt column is greater than that of the ref column in the Vcf and if #pass_quality? is true. Looks only at the positions that are predicted simple deletions, any positions where the alt alleles includes more than one deletion or a SNP or an insertion also is ignored.

# File lib/bio/util/bio-gngm.rb, line 165
def is_insertion?(options)
    case
    when (not self.has_just_one_variant?) then false
    when ( self.alt.length > self.ref.length and self.pass_quality?(options)  ) then true 
    else false
    end
  rescue ## if something goes wrong, skip the postion, 
    false
end
mq() click to toggle source

Returns the mean Mapping Quality from the reads over this position as defined by the Vcf MQ attribute.

# File lib/bio/util/bio-gngm.rb, line 119
def mq
  self.info["MQ"].to_f rescue 0.0
end
non_ref_allele_count() click to toggle source

Returns the depth of reads containing the non reference allele. IE the sum of the last two figures in the DP4 attribute.

# File lib/bio/util/bio-gngm.rb, line 106
def non_ref_allele_count
  self.info["DP4"].split(",")[2..3].inject {|sum,n| sum.to_f + n.to_f } rescue 0.0
end
non_ref_allele_freq() click to toggle source

Returns the non-reference allele frequency based on depth of reads used for the genotype call,

IE

vcf.non_ref_allele_count / vcf.used_depth
# File lib/bio/util/bio-gngm.rb, line 114
def non_ref_allele_freq
  self.non_ref_allele_count / self.used_depth
end
pass_quality?(options) click to toggle source

Returns true if the position passes criteria

Options and Defaults:

  • :min_depth => 2

  • :min_non_ref_count => 2

  • :mapping_quality => 10

Example

vcf.pass_quality?(:min_depth => 5, :min_non_ref_count => 2, :mapping_quality => 25)
# File lib/bio/util/bio-gngm.rb, line 147
def pass_quality?(options)
  (self.used_depth >= options[:min_depth] and self.mq >= options[:mapping_quality] and self.non_ref_allele_count >= options[:min_non_ref_count])
end
pl() click to toggle source

Returns the phred scaled likelihood of the first non-reference allele (as defined by the Vcf PL attribute) for the first sample in the Vcf only.

# File lib/bio/util/bio-gngm.rb, line 129
def pl
  self.samples["1"]["PL"].split(",")[1].to_f rescue 0.0
end
to_s() click to toggle source

Return a short string representing chromosome, position, reference sequence, alt sequence(s) and the info string of the Vcf object.

# File lib/bio/util/bio-gngm.rb, line 85
def to_s
  "#{self.chrom} #{self.pos} #{self.ref} #{self.alt} #{self.info}"
end
used_depth() click to toggle source

The depth of reads actually used in the genotype call by Vcftools. The sum of the DP4 attribute. Returns 0.0 if no value is calculated.

# File lib/bio/util/bio-gngm.rb, line 90
def used_depth
  self.info["DP4"].split(",").inject {|sum,n| sum.to_f + n.to_f} rescue 0.0
end
variant?() click to toggle source

returns true if the alt column of the Vcf is not .

Examples

vcf record = 20 14370 rs6054257 G A 29 PASS …

vcf.variant? #=> true

vcf record = 20 1230237 . T . 47 PASS …

vcf.variant? #=> false
# File lib/bio/util/bio-gngm.rb, line 80
def variant?
  not self.alt == "." rescue false
end