Class: GeneMergeValidation

Inherits:
ValidationTest show all
Defined in:
lib/genevalidator/validation_gene_merge.rb

Overview

This class contains the methods necessary for checking whether there is evidence that the prediction is a merge of multiple genes

Instance Attribute Summary (collapse)

Attributes inherited from ValidationTest

#cli_name, #description, #header, #running_time, #short_header, #type, #validation_report

Instance Method Summary (collapse)

Constructor Details

- (GeneMergeValidation) initialize(type, prediction, hits, filename)

Initilizes the object Params: type: type of the predicted sequence (:nucleotide or :protein) prediction: a Sequence object representing the blast query hits: a vector of Sequence objects (usually representig the blast hits) filename: name of the input file, used when generatig the plot files



66
67
68
69
70
71
72
73
74
75
76
# File 'lib/genevalidator/validation_gene_merge.rb', line 66

def initialize(type, prediction, hits, filename)
  super
  @filename     = filename
  @short_header = "Gene_Merge(slope)"
  @header       = "Gene Merge"
  @description = "Check whether BLAST hits make evidence about a merge of two"<<
  " genes that match the predicted gene. Meaning of the output displayed:"<<
  " slope of the linear regression of the relationship between the start and"<<
  " stop offsets of the hsps (see the plot). Invalid slopes are around 45 degrees."
  @cli_name     = "merge"
end

Instance Attribute Details

- (Object) filename (readonly)

Returns the value of attribute filename



57
58
59
# File 'lib/genevalidator/validation_gene_merge.rb', line 57

def filename
  @filename
end

- (Object) hits (readonly)

Returns the value of attribute hits



55
56
57
# File 'lib/genevalidator/validation_gene_merge.rb', line 55

def hits
  @hits
end

- (Object) prediction (readonly)

Returns the value of attribute prediction



56
57
58
# File 'lib/genevalidator/validation_gene_merge.rb', line 56

def prediction
  @prediction
end

Instance Method Details

- (Object) plot_2d_start_from(slope, y_intercept, output = "#{filename}_match_2d.json", hits = @hits)

Generates a json file containing data used for plotting the start/end of the matched region offsets in the prediction Param slope: slope of the linear regression line y_intercept: the ecuation of the line is y= slope*x + y_intercept output: location where the plot will be saved in jped file format hits: array of Sequence objects



155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/genevalidator/validation_gene_merge.rb', line 155

def plot_2d_start_from(slope, y_intercept, output = "#{filename}_match_2d.json", hits = @hits)    
  f = File.open(output , "w")
  f.write(hits.map{|hit| {"x"=>hit.hsp_list.map{|hsp| hsp.match_query_from}.min, 
                           "y"=>hit.hsp_list.map{|hsp| hsp.match_query_to}.max}}.to_json)
  f.close
  return Plot.new(output.scan(/\/([^\/]+)$/)[0][0],
                              :scatter,
                              "Start vs end hsp match",
                              "",
                              "from",
                              "to",
                               y_intercept,
                               slope)

end

- (Object) plot_matched_regions(output = "#{filename}_match.json", hits = @hits, prediction = @prediction)

Generates a json file containing data used for plotting the matched region of the prediction for each hit Param output: location where the plot will be saved in jped file format hits: array of Sequence objects prediction: Sequence objects



120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/genevalidator/validation_gene_merge.rb', line 120

def plot_matched_regions(output = "#{filename}_match.json", hits = @hits, prediction = @prediction)

    colors   = ["yellow", "red"]
    f        = File.open(output , "w")
    no_lines = 100
    ratio    = (hits.length/no_lines).to_i

    if ratio == 0
      f.write((hits.each_with_index.map{|hit, i| {"y"=>i, "start"=>0, "stop"=>prediction.length_protein, "color"=>"gray"}} +
            hits.each_with_index.map{|hit, i| hit.hsp_list.map{|hsp| 
            {"y"=>i, "start"=>hsp.match_query_from, "stop"=>hsp.match_query_to, "color"=>"#{colors[i%2]}"}}}.flatten).to_json)
    else
      f.write((hits.select.each_with_index {|hit, i| i%ratio==0}.each_with_index.map{|hit, i| {"y"=>i, "start"=>0, "stop"=>prediction.length_protein, "color"=>"gray"}} +
            hits.select.each_with_index {|hit, i| i%ratio==0}.each_with_index.map{|hit, i| hit.hsp_list.map{|hsp| 
            {"y"=>i, "start"=>hsp.match_query_from, "stop"=>hsp.match_query_to, "color"=>"#{colors[i%2]}"}}}.flatten).to_json)
    end
    f.close
    return Plot.new(output.scan(/\/([^\/]+)$/)[0][0], 
                     :lines,  
                     "Prediction vs hit match", 
                     "prediction, gray; prediction high-scoring alignmet seq, red; prediction high-scoring alignmet seq, yellow", 
                     "length", 
                     "idx",
                     hits.length)

end

- (Object) run

Validation test for gene merge Output: GeneMergeValidationOutput object



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/genevalidator/validation_gene_merge.rb', line 82

def run
  begin
    raise NotEnoughHitsError unless hits.length >= 5
    raise Exception unless prediction.is_a? Sequence and hits[0].is_a? Sequence

    start = Time.now

    lm_slope = slope[1]
    y_intercept = slope[0]

    @validation_report = GeneMergeValidationOutput.new(lm_slope)

    plot1 = plot_2d_start_from(lm_slope, y_intercept)
    @validation_report.plot_files.push(plot1)
    plot2 = plot_matched_regions
    @validation_report.plot_files.push(plot2)
    @running_time = Time.now - start
    return @validation_report

  # Exception is raised when blast founds no hits
  rescue  NotEnoughHitsError => error
    @validation_report = ValidationReport.new("Not enough evidence", :warning)
    return @validation_report
  rescue Exception => error
    @validation_report.errors.push OtherError 
    @validation_report = ValidationReport.new("Unexpected error", :error)
    return @validation_report
  end

end

- (Object) slope(hits = @hits)

Caclulates the slope of the regression line give a set of 2d coordonates of the start/stop offests of the hits Param hits: array of Sequence objects Code inspired from: engineering.sharethrough.com/blog/2012/09/12/simple-linear-regression-using-ruby/ Output: The ecuation of the regression line: [y slope]



179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# File 'lib/genevalidator/validation_gene_merge.rb', line 179

def slope(hits = @hits)
  
  pairs = @hits.map {|hit| Pair.new(hit.hsp_list.map{|hsp| hsp.match_query_from}.min, hit.hsp_list.map{|hsp| hsp.match_query_to}.max)}

  xx = pairs.map{|pair| pair.x}
  yy = pairs.map{|pair| pair.y}

  # calculate the slope
  x_mean = xx.reduce(0) { |sum, x| x + sum } / (xx.length + 0.0)
  y_mean = yy.reduce(0) { |sum, x| x + sum } / (yy.length + 0.0)
 
  numerator = (0...xx.length).reduce(0) do |sum, i|
    sum + ((xx[i] - x_mean) * (yy[i] - y_mean))
  end
 
  denominator = xx.reduce(0) do |sum, x|
    sum + ((x - x_mean) ** 2)
  end
 
  slope = numerator / (denominator + 0.0)
  y_intercept = y_mean - (slope * x_mean)

  return [y_intercept, slope]
  
end