Class: GeneMergeValidation
- Inherits:
-
ValidationTest
- Object
- ValidationTest
- GeneMergeValidation
- Defined in:
- lib/genevalidator/validation_gene_merge.rb
Overview
This class contains the methods necessary for checking whether there is evidence that the prediction is a merge of multiple genes
Instance Attribute Summary (collapse)
-
- (Object) filename
readonly
Returns the value of attribute filename.
-
- (Object) hits
readonly
Returns the value of attribute hits.
-
- (Object) prediction
readonly
Returns the value of attribute prediction.
Attributes inherited from ValidationTest
#cli_name, #description, #header, #running_time, #short_header, #type, #validation_report
Instance Method Summary (collapse)
-
- (GeneMergeValidation) initialize(type, prediction, hits, filename)
constructor
Initilizes the object Params: type: type of the predicted sequence (:nucleotide or :protein) prediction: a Sequence object representing the blast query hits: a vector of Sequence objects (usually representig the blast hits) filename: name of the input file, used when generatig the plot files.
-
- (Object) plot_2d_start_from(slope, y_intercept, output = "#{filename}_match_2d.json", hits = @hits)
Generates a json file containing data used for plotting the start/end of the matched region offsets in the prediction Param slope: slope of the linear regression line y_intercept: the ecuation of the line is y= slope*x + y_intercept output: location where the plot will be saved in jped file format hits: array of Sequence objects.
-
- (Object) plot_matched_regions(output = "#{filename}_match.json", hits = @hits, prediction = @prediction)
Generates a json file containing data used for plotting the matched region of the prediction for each hit Param output: location where the plot will be saved in jped file format hits: array of Sequence objects prediction: Sequence objects.
-
- (Object) run
Validation test for gene merge Output: GeneMergeValidationOutput object.
-
- (Object) slope(hits = @hits)
Caclulates the slope of the regression line give a set of 2d coordonates of the start/stop offests of the hits Param hits: array of Sequence objects Code inspired from: engineering.sharethrough.com/blog/2012/09/12/simple-linear-regression-using-ruby/ Output: The ecuation of the regression line: [y slope].
Constructor Details
- (GeneMergeValidation) initialize(type, prediction, hits, filename)
Initilizes the object Params: type: type of the predicted sequence (:nucleotide or :protein) prediction: a Sequence object representing the blast query hits: a vector of Sequence objects (usually representig the blast hits) filename: name of the input file, used when generatig the plot files
66 67 68 69 70 71 72 73 74 75 76 |
# File 'lib/genevalidator/validation_gene_merge.rb', line 66 def initialize(type, prediction, hits, filename) super @filename = filename @short_header = "Gene_Merge(slope)" @header = "Gene Merge" @description = "Check whether BLAST hits make evidence about a merge of two"<< " genes that match the predicted gene. Meaning of the output displayed:"<< " slope of the linear regression of the relationship between the start and"<< " stop offsets of the hsps (see the plot). Invalid slopes are around 45 degrees." @cli_name = "merge" end |
Instance Attribute Details
- (Object) filename (readonly)
Returns the value of attribute filename
57 58 59 |
# File 'lib/genevalidator/validation_gene_merge.rb', line 57 def filename @filename end |
- (Object) hits (readonly)
Returns the value of attribute hits
55 56 57 |
# File 'lib/genevalidator/validation_gene_merge.rb', line 55 def hits @hits end |
- (Object) prediction (readonly)
Returns the value of attribute prediction
56 57 58 |
# File 'lib/genevalidator/validation_gene_merge.rb', line 56 def prediction @prediction end |
Instance Method Details
- (Object) plot_2d_start_from(slope, y_intercept, output = "#{filename}_match_2d.json", hits = @hits)
Generates a json file containing data used for plotting the start/end of the matched region offsets in the prediction Param slope: slope of the linear regression line y_intercept: the ecuation of the line is y= slope*x + y_intercept output: location where the plot will be saved in jped file format hits: array of Sequence objects
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
# File 'lib/genevalidator/validation_gene_merge.rb', line 155 def plot_2d_start_from(slope, y_intercept, output = "#{filename}_match_2d.json", hits = @hits) f = File.open(output , "w") f.write(hits.map{|hit| {"x"=>hit.hsp_list.map{|hsp| hsp.match_query_from}.min, "y"=>hit.hsp_list.map{|hsp| hsp.match_query_to}.max}}.to_json) f.close return Plot.new(output.scan(/\/([^\/]+)$/)[0][0], :scatter, "Start vs end hsp match", "", "from", "to", y_intercept, slope) end |
- (Object) plot_matched_regions(output = "#{filename}_match.json", hits = @hits, prediction = @prediction)
Generates a json file containing data used for plotting the matched region of the prediction for each hit Param output: location where the plot will be saved in jped file format hits: array of Sequence objects prediction: Sequence objects
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/genevalidator/validation_gene_merge.rb', line 120 def plot_matched_regions(output = "#{filename}_match.json", hits = @hits, prediction = @prediction) colors = ["yellow", "red"] f = File.open(output , "w") no_lines = 100 ratio = (hits.length/no_lines).to_i if ratio == 0 f.write((hits.each_with_index.map{|hit, i| {"y"=>i, "start"=>0, "stop"=>prediction.length_protein, "color"=>"gray"}} + hits.each_with_index.map{|hit, i| hit.hsp_list.map{|hsp| {"y"=>i, "start"=>hsp.match_query_from, "stop"=>hsp.match_query_to, "color"=>"#{colors[i%2]}"}}}.flatten).to_json) else f.write((hits.select.each_with_index {|hit, i| i%ratio==0}.each_with_index.map{|hit, i| {"y"=>i, "start"=>0, "stop"=>prediction.length_protein, "color"=>"gray"}} + hits.select.each_with_index {|hit, i| i%ratio==0}.each_with_index.map{|hit, i| hit.hsp_list.map{|hsp| {"y"=>i, "start"=>hsp.match_query_from, "stop"=>hsp.match_query_to, "color"=>"#{colors[i%2]}"}}}.flatten).to_json) end f.close return Plot.new(output.scan(/\/([^\/]+)$/)[0][0], :lines, "Prediction vs hit match", "prediction, gray; prediction high-scoring alignmet seq, red; prediction high-scoring alignmet seq, yellow", "length", "idx", hits.length) end |
- (Object) run
Validation test for gene merge Output: GeneMergeValidationOutput object
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
# File 'lib/genevalidator/validation_gene_merge.rb', line 82 def run begin raise NotEnoughHitsError unless hits.length >= 5 raise Exception unless prediction.is_a? Sequence and hits[0].is_a? Sequence start = Time.now lm_slope = slope[1] y_intercept = slope[0] @validation_report = GeneMergeValidationOutput.new(lm_slope) plot1 = plot_2d_start_from(lm_slope, y_intercept) @validation_report.plot_files.push(plot1) plot2 = plot_matched_regions @validation_report.plot_files.push(plot2) @running_time = Time.now - start return @validation_report # Exception is raised when blast founds no hits rescue NotEnoughHitsError => error @validation_report = ValidationReport.new("Not enough evidence", :warning) return @validation_report rescue Exception => error @validation_report.errors.push OtherError @validation_report = ValidationReport.new("Unexpected error", :error) return @validation_report end end |
- (Object) slope(hits = @hits)
Caclulates the slope of the regression line give a set of 2d coordonates of the start/stop offests of the hits Param hits: array of Sequence objects Code inspired from: engineering.sharethrough.com/blog/2012/09/12/simple-linear-regression-using-ruby/ Output: The ecuation of the regression line: [y slope]
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
# File 'lib/genevalidator/validation_gene_merge.rb', line 179 def slope(hits = @hits) pairs = @hits.map {|hit| Pair.new(hit.hsp_list.map{|hsp| hsp.match_query_from}.min, hit.hsp_list.map{|hsp| hsp.match_query_to}.max)} xx = pairs.map{|pair| pair.x} yy = pairs.map{|pair| pair.y} # calculate the slope x_mean = xx.reduce(0) { |sum, x| x + sum } / (xx.length + 0.0) y_mean = yy.reduce(0) { |sum, x| x + sum } / (yy.length + 0.0) numerator = (0...xx.length).reduce(0) do |sum, i| sum + ((xx[i] - x_mean) * (yy[i] - y_mean)) end denominator = xx.reduce(0) do |sum, x| sum + ((x - x_mean) ** 2) end slope = numerator / (denominator + 0.0) y_intercept = y_mean - (slope * x_mean) return [y_intercept, slope] end |