Class: HierarchicalClusterization
- Inherits:
-
Object
- Object
- HierarchicalClusterization
- Defined in:
- lib/genevalidator/clusterization.rb
Instance Attribute Summary (collapse)
-
- (Object) clusters
Returns the value of attribute clusters.
-
- (Object) values
Returns the value of attribute values.
Instance Method Summary (collapse)
-
- (Object) hierarchical_clusterization(no_clusters = 0, distance_method = 0, vec = @values, debug = false)
Makes an hierarchical clusterization until the most dense cluster is obtained or the distance between clusters is sufficintly big or the desired number of clusters is obtained Params: vec: a vector of values (by default the values from initialization) no_clusters: stop test (number of clusters) distance_method: distance method (method 0 or method 1) debug: display debug information Output: vector of Cluster objects.
- - (Object) hierarchical_clusterization_2d(no_clusters = 0, distance_method = 0, vec = @values, debug = false)
-
- (HierarchicalClusterization) initialize(values)
constructor
Object initialization Params: values :vector of values.
-
- (Object) most_dense_cluster(clusters = @clusters)
Returns the cluster with the maimum density Params: clusters: list of Clususter objects.
Constructor Details
- (HierarchicalClusterization) initialize(values)
Object initialization Params: values :vector of values
382 383 384 385 |
# File 'lib/genevalidator/clusterization.rb', line 382 def initialize(values) @values = values @clusters = [] end |
Instance Attribute Details
- (Object) clusters
Returns the value of attribute clusters
376 377 378 |
# File 'lib/genevalidator/clusterization.rb', line 376 def clusters @clusters end |
- (Object) values
Returns the value of attribute values
375 376 377 |
# File 'lib/genevalidator/clusterization.rb', line 375 def values @values end |
Instance Method Details
- (Object) hierarchical_clusterization(no_clusters = 0, distance_method = 0, vec = @values, debug = false)
Makes an hierarchical clusterization until the most dense cluster is obtained or the distance between clusters is sufficintly big or the desired number of clusters is obtained Params: vec: a vector of values (by default the values from initialization) no_clusters: stop test (number of clusters) distance_method: distance method (method 0 or method 1) debug: display debug information Output: vector of Cluster objects
503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 |
# File 'lib/genevalidator/clusterization.rb', line 503 def hierarchical_clusterization (no_clusters = 0, distance_method = 0, vec = @values, debug = false) clusters = [] vec = vec.sort if vec.length == 1 hash = {vec[0]=>1} cluster = Cluster.new(hash) clusters.push(cluster) clusters end # Thresholds threshold_distance = (0.25 * (vec.max-vec.min)) threshold_density = (0.5 * vec.length).to_i # make a histogram from the input vector histogram = Hash[vec.group_by { |x| x }.map { |k, vs| [k, vs.length] }] # clusters = array of clusters #initially each length belongs to a different cluster histogram.sort {|a,b| a[0]<=>b[0]}.each do |elem| if debug puts "len #{elem[0]} appears #{elem[1]} times" end hash = {elem[0] => elem[1]} cluster = Cluster.new(hash) clusters.push(cluster) end if debug clusters.each do |elem| elem.print end end if clusters.length == 1 return clusters end # each iteration merge the closest two adiacent cluster # the loop stops according to the stop conditions iteration = 0 loop do #stop condition 1 if no_clusters != 0 and clusters.length == no_clusters break end iteration = iteration + 1 if debug puts "\nIteration #{iteration}" end min_distance = 100000000 cluster = 0 density = 0 clusters[0..clusters.length-2].each_with_index do |item, i| dist = clusters[i].distance(clusters[i+1], distance_method) if debug puts "distance between clusters #{i} and #{i+1} is #{dist}" end current_density = clusters[i].density + clusters[i+1].density if dist < min_distance min_distance = dist cluster = i density = current_density else if dist == min_distance and density < current_density cluster = i density = current_density end end end #stop condition 2 #the distance between the closest clusters exceeds the threshold if no_clusters == 0 and (clusters[cluster].mean - clusters[cluster+1].mean).abs > threshold_distance break end #merge clusters 'cluster' and 'cluster'+1 if debug puts "clusters to merge #{cluster} and #{cluster+1}" end clusters[cluster].add(clusters[cluster+1]) clusters.delete_at(cluster+1) if debug clusters.each_with_index do |elem, i| puts "cluster #{i}" elem.print end end #stop condition 3 #the density of the biggest clusters exceeds the threshold if no_clusters == 0 and clusters[cluster].density > threshold_density break end end @clusters = clusters clusters end |
- (Object) hierarchical_clusterization_2d(no_clusters = 0, distance_method = 0, vec = @values, debug = false)
387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 |
# File 'lib/genevalidator/clusterization.rb', line 387 def hierarchical_clusterization_2d (no_clusters = 0, distance_method = 0, vec = @values, debug = false) clusters = [] if vec.length == 1 hash = {vec[0]=>1} cluster = PairCluster.new(hash) clusters.push(cluster) clusters end # Thresholds # threshold_distance = (0.25 * (vec.max-vec.min)) threshold_density = (0.5 * vec.length).to_i # make a histogram from the input vector histogram = Hash[vec.group_by{|a| a}.map { |k, vs| [k, vs.length] }] # clusters = array of clusters # initially each length belongs to a different cluster histogram.each do |elem| if debug puts "pair (#{elem[0].x} #{elem[0].y}) appears #{elem[1]} times" end hash = {elem[0] => elem[1]} cluster = PairCluster.new(hash) clusters.push(cluster) end if debug clusters.each do |elem| elem.print end end if clusters.length == 1 return clusters end # each iteration merge the closest two adiacent cluster # the loop stops according to the stop conditions iteration = 0 loop do #stop condition 1 if no_clusters != 0 and clusters.length == no_clusters break end iteration = iteration + 1 if debug puts "\nIteration #{iteration}" end min_distance = 100000000 cluster1 = 0 cluster2 = 0 density = 0 [*(0..(clusters.length-2))].each do |i| [*((i+1)..(clusters.length-1))].each do |j| dist = clusters[i].distance(clusters[j], distance_method) if debug puts "distance between clusters #{i} and #{j} is #{dist}" end current_density = clusters[i].density + clusters[j].density if dist < min_distance min_distance = dist cluster1 = i cluster2 = j density = current_density else if dist == min_distance and density < current_density cluster1 = i cluster2 = j density = current_density end end end end # merge clusters 'cluster1' and 'cluster2' if debug puts "clusters to merge #{cluster1} and #{cluster2}" end clusters[cluster1].add(clusters[cluster2]) clusters.delete_at(cluster2) if debug clusters.each_with_index do |elem, i| puts "cluster #{i}" elem.print end end #stop condition 3 #the density of the biggest clusters exceeds the threshold if no_clusters == 0 and clusters[cluster].density > threshold_density break end end @clusters = clusters clusters end |
- (Object) most_dense_cluster(clusters = @clusters)
Returns the cluster with the maimum density Params: clusters: list of Clususter objects
616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 |
# File 'lib/genevalidator/clusterization.rb', line 616 def most_dense_cluster(clusters = @clusters) max_density = 0; max_density_cluster = 0; if clusters == nil nil end clusters.each_with_index do |item, i| if item.density > max_density max_density = item.density max_density_cluster = i; end end clusters[max_density_cluster] end |