README.md in d_heap-0.5.0 vs README.md in d_heap-0.6.0
- old
+ new
@@ -1,28 +1,43 @@
-# DHeap
+# DHeap - Fast d-ary heap for ruby
+[![Gem Version](https://badge.fury.io/rb/d_heap.svg)](https://badge.fury.io/rb/d_heap)
+[![Build Status](https://github.com/nevans/d_heap/workflows/CI/badge.svg)](https://github.com/nevans/d_heap/actions?query=workflow%3ACI)
+[![Maintainability](https://api.codeclimate.com/v1/badges/ff274acd0683c99c03e1/maintainability)](https://codeclimate.com/github/nevans/d_heap/maintainability)
+
A fast [_d_-ary heap][d-ary heap] [priority queue] implementation for ruby,
implemented as a C extension.
+From [wikipedia](https://en.wikipedia.org/wiki/Heap_(data_structure)):
+> A heap is a specialized tree-based data structure which is essentially an
+> almost complete tree that satisfies the heap property: in a min heap, for any
+> given node C, if P is a parent node of C, then the key (the value) of P is
+> less than or equal to the key of C. The node at the "top" of the heap (with no
+> parents) is called the root node.
+
+![tree representation of a min heap](images/wikipedia-min-heap.png)
+
With a regular queue, you expect "FIFO" behavior: first in, first out. With a
stack you expect "LIFO": last in first out. A priority queue has a score for
each element and elements are popped in order by score. Priority queues are
often used in algorithms for e.g. [scheduling] of timers or bandwidth
management, for [Huffman coding], and various graph search algorithms such as
[Dijkstra's algorithm], [A* search], or [Prim's algorithm].
The _d_-ary heap data structure is a generalization of the [binary heap], in
which the nodes have _d_ children instead of 2. This allows for "insert" and
"decrease priority" operations to be performed more quickly with the tradeoff of
-slower delete minimum. Additionally, _d_-ary heaps can have better memory cache
-behavior than binary heaps, allowing them to run more quickly in practice
-despite slower worst-case time complexity. In the worst case, a _d_-ary heap
-requires only `O(log n / log d)` operations to push, with the tradeoff that pop
-requires `O(d log n / log d)`.
+slower delete minimum or "increase priority". Additionally, _d_-ary heaps can
+have better memory cache behavior than binary heaps, allowing them to run more
+quickly in practice despite slower worst-case time complexity. In the worst
+case, a _d_-ary heap requires only `O(log n / log d)` operations to push, with
+the tradeoff that pop requires `O(d log n / log d)`.
Although you should probably just use the default _d_ value of `4` (see the
-analysis below), it's always advisable to benchmark your specific use-case.
+analysis below), it's always advisable to benchmark your specific use-case. In
+particular, if you push items more than you pop, higher values for _d_ can give
+a faster total runtime.
[d-ary heap]: https://en.wikipedia.org/wiki/D-ary_heap
[priority queue]: https://en.wikipedia.org/wiki/Priority_queue
[binary heap]: https://en.wikipedia.org/wiki/Binary_heap
[scheduling]: https://en.wikipedia.org/wiki/Scheduling_(computing)
@@ -31,105 +46,92 @@
[A* search]: https://en.wikipedia.org/wiki/A*_search_algorithm#Description
[Prim's algorithm]: https://en.wikipedia.org/wiki/Prim%27s_algorithm
## Usage
-Quick reference:
+The basic API is `#push(object, score)` and `#pop`. Please read the
+[gem documentation] for more details and other methods.
+Quick reference for some common methods:
+
* `heap << object` adds a value, with `Float(object)` as its score.
* `heap.push(object, score)` adds a value with an extrinsic score.
* `heap.pop` removes and returns the value with the minimum score.
-* `heap.pop_lte(score)` pops if the minimum score is `<=` the provided score.
+* `heap.pop_lte(max_score)` pops only if the next score is `<=` the argument.
* `heap.peek` to view the minimum value without popping it.
* `heap.clear` to remove all items from the heap.
* `heap.empty?` returns true if the heap is empty.
* `heap.size` returns the number of items in the heap.
-The basic API is `#push(object, score)` and `pop`. If your values behave as
-their own score, then you can push with `#<<`. If the score changes while the
-object is still in the heap, it will not be re-evaluated again. The score must
-either be `Integer` or `Float` or convertable to a `Float` via `Float(score)`
-(i.e. it should implement `#to_f`).
+If the score changes while the object is still in the heap, it will not be
+re-evaluated again.
-```ruby
-require "d_heap"
+The score must either be `Integer` or `Float` or convertable to a `Float` via
+`Float(score)` (i.e. it should implement `#to_f`). Constraining scores to
+numeric values gives more than 50% speedup under some benchmarks! _n.b._
+`Integer` _scores must have an absolute value that fits into_ `unsigned long
+long`. This is compiler and architecture dependant but with gcc on an IA-64
+system it's 64 bits, which gives a range of -18,446,744,073,709,551,615 to
++18,446,744,073,709,551,615, which is more than enough to store e.g. POSIX time
+in nanoseconds.
+_Comparing arbitary objects via_ `a <=> b` _was the original design and may be
+added back in a future version,_ if (and only if) _it can be done without
+impacting the speed of numeric comparisons. The speedup from this constraint is
+huge!_
+
+[gem documentation]: https://rubydoc.info/gems/d_heap/DHeap
+
+### Examples
+
+```ruby
+# create some example objects to place in our heap
Task = Struct.new(:id, :time) do
def to_f; time.to_f end
end
t1 = Task.new(1, Time.now + 5*60)
t2 = Task.new(2, Time.now + 50)
t3 = Task.new(3, Time.now + 60)
t4 = Task.new(4, Time.now + 5)
-# if the object returns its own score via #to_f, "<<" is the simplest API
-heap << t1 << t2
+# create the heap
+require "d_heap"
+heap = DHeap.new
-# or push with an explicit score
-heap.push t3, t4.to_f
-heap.push t4, t4 # score can be implicitly cast with Float
+# push with an explicit score (which might be extrinsic to the value)
+heap.push t1, t1.to_f
-# peek and pop
+# the score will be implicitly cast with Float, so any object with #to_f
+heap.push t2, t2
+
+# if the object has an intrinsic score via #to_f, "<<" is the simplest API
+heap << t3 << t4
+
+# pop returns the lowest scored item, and removes it from the heap
heap.pop # => #<struct Task id=4, time=2021-01-17 17:02:22.5574 -0500>
heap.pop # => #<struct Task id=2, time=2021-01-17 17:03:07.5574 -0500>
+
+# peek returns the lowest scored item, without removing it from the heap
heap.peek # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
heap.pop # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
-heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
-heap.empty? # => true
-heap.pop # => nil
-```
-Constraining scores to numeric values gives more than 50% speedup under some
-benchmarks! _n.b._ `Integer` _scores must have an absolute value that fits
-into_ `unsigned long long`. _This is architecture dependant but on an IA-64
-system this is 64 bits, which gives a range of -18,446,744,073,709,551,615 to
-+18446744073709551615. Comparing arbitary objects via_ `a <=> b` _was the
-original design and may be added back in a future version,_ if (and only if) _it
-can be done without impacting the speed of numeric comparisons._
+# pop_lte handles the common "h.pop if h.peek_score < max" pattern
+heap.pop_lte(Time.now + 65) # => nil
-```ruby
-heap.clear
-
-# The score can be derived from the value by using to_f.
-# "a <=> b" is *much* slower than comparing numbers, so it isn't used.
-class Event
- include Comparable
- attr_reader :time, :payload
- alias_method :to_time, :time
-
- def initialize(time, payload)
- @time = time.to_time
- @payload = payload
- freeze
- end
-
- def to_f
- time.to_f
- end
-
- def <=>(other)
- to_f <=> other.to_f
- end
-end
-
-heap << comparable_max # sorts last, using <=>
-heap << comparable_min # sorts first, using <=>
-heap << comparable_mid # sorts in the middle, using <=>
-heap.pop # => comparable_min
-heap.pop # => comparable_mid
-heap.pop # => comparable_max
+# the heap size can be inspected with size and empty?
+heap.empty? # => false
+heap.size # => 1
+heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
heap.empty? # => true
+heap.size # => 0
+
+# popping from an empty heap returns nil
heap.pop # => nil
```
-You can also pass a value into `#pop(max)` which will only pop if the minimum
-score is less than or equal to `max`.
+Please see the [gem documentation] for more methods and more examples.
-Read the [API documentation] for more detailed documentation and examples.
-
-[API documentation]: https://rubydoc.info/gems/d_heap/DHeap
-
## Installation
Add this line to your application's Gemfile:
```ruby
@@ -151,109 +153,79 @@
`Array#insert`. This can be very fast—`Array#pop` is `O(1)`—but the worst-case
for insert is `O(n)` because it may need to `memcpy` a significant portion of
the array.
The standard way to implement a priority queue is with a binary heap. Although
-this increases the time for `pop`, it converts the amortized time per push + pop
-from `O(n)` to `O(d log n / log d)`.
+this increases the time complexity for `pop` alone, it reduces the combined time
+compexity for the combined `push` + `pop`. Using a d-ary heap with d > 2
+makes the tree shorter but broader, which reduces to `O(log n / log d)` while
+increasing the comparisons needed by sift-down to `O(d log n/ log d)`.
-However, I was surprised to find that—at least for some benchmarks—my pure ruby
-heap implementation was much slower than inserting into and popping from a fully
-sorted array. The reasons for this surprising result: Although it is `O(n)`,
-`memcpy` has a _very_ small constant factor, and calling `<=>` from ruby code
-has relatively _much_ larger constant factors. If your queue contains only a
-few thousand items, the overhead of those extra calls to `<=>` is _far_ more
-than occasionally calling `memcpy`. In the worst case, a _d_-heap will require
-`d + 1` times more comparisons for each push + pop than a `bsearch` + `insert`
-sorted array.
+However, I was disappointed when my best ruby heap implementation ran much more
+slowly than the naive approach—even for heaps containing ten thousand items.
+Although it _is_ `O(n)`, `memcpy` is _very_ fast, while calling `<=>` from ruby
+has _much_ higher overhead. And a _d_-heap needs `d + 1` times more comparisons
+for each push + pop than `bsearch` + `insert`.
-Moving the sift-up and sift-down code into C helps some. But much more helpful
-is optimizing the comparison of numeric scores, so `a <=> b` never needs to be
-called. I'm hopeful that MJIT will eventually obsolete this C-extension. This
-can be hotspot code, and a basic ruby implementation could perform well if `<=>`
-had much lower overhead.
+Additionally, when researching how other systems handle their scheduling, I was
+inspired by reading go's "timer.go" implementation to experiment with a 4-ary
+heap instead of the traditional binary heap.
-## Analysis
+## Benchmarks
-### Time complexity
+_See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
+`docs/profile.txt` for much more detail or updated results. These benchmarks
+were measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
-There are two fundamental heap operations: sift-up (used by push) and sift-down
-(used by pop).
+These benchmarks use very simple implementations for a pure-ruby heap and an
+array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
+comparison, I also compare to the [priority_queue_cxx gem] which uses the [C++
+STL priority_queue], and another naive implementation that uses `Array#min` and
+`Array#delete_at` with an unsorted array.
-* Both sift operations can perform as many as `log n / log d` swaps, as the
- element may sift from the bottom of the tree to the top, or vice versa.
-* Sift-up performs a single comparison per swap: `O(1)`.
- So pushing a new element is `O(log n / log d)`.
-* Swap down performs as many as d comparions per swap: `O(d)`.
- So popping the min element is `O(d log n / log d)`.
+In these benchmarks, `DHeap` runs faster than all other implementations for
+every scenario and every value of N, although the difference is usually more
+noticable at higher values of N. The pure ruby heap implementation is
+competitive for `push` alone at every value of N, but is significantly slower
+than bsearch + insert for push + pop, until N is _very_ large (somewhere between
+10k and 100k)!
-Assuming every inserted element is eventually deleted from the root, d=4
-requires the fewest comparisons for combined insert and delete:
+[priority_queue_cxx gem]: https://rubygems.org/gems/priority_queue_cxx
+[C++ STL priority_queue]: http://www.cplusplus.com/reference/queue/priority_queue/
-* (1 + 2) lg 2 = 4.328085
-* (1 + 3) lg 3 = 3.640957
-* (1 + 4) lg 4 = 3.606738
-* (1 + 5) lg 5 = 3.728010
-* (1 + 6) lg 6 = 3.906774
-* etc...
+Three different scenarios are measured:
-Leaf nodes require no comparisons to shift down, and higher values for d have
-higher percentage of leaf nodes:
+### push N items onto an empty heap
-* d=2 has ~50% leaves,
-* d=3 has ~67% leaves,
-* d=4 has ~75% leaves,
-* and so on...
+...but never pop (clearing between each set of pushes).
-See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
+![bar graph for push_n_pop_n benchmarks](./images/push_n.png)
-### Space complexity
+### push N items onto an empty heap then pop all N
-Space usage is linear, regardless of d. However higher d values may
-provide better cache locality. Because the heap is a complete binary tree, the
-elements can be stored in an array, without the need for tree or list pointers.
+Although this could be used for heap sort, we're unlikely to choose heap sort
+over Ruby's quick sort implementation. I'm using this scenario to represent
+the amortized cost of creating a heap and (eventually) draining it.
-Ruby can compare Numeric values _much_ faster than other ruby objects, even if
-those objects simply delegate comparison to internal Numeric values. And it is
-often useful to use external scores for otherwise uncomparable values. So
-`DHeap` uses twice as many entries (one for score and one for value)
-as an array which only stores values.
+![bar graph for push_n_pop_n benchmarks](./images/push_n_pop_n.png)
-## Benchmarks
+### push and pop on a heap with N values
-_See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
-`docs/profile.txt` for more details or updated results. These benchmarks were
-measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
+Repeatedly push and pop while keeping a stable heap size. This is a _very
+simplistic_ approximation for how most scheduler/timer heaps might be used.
+Usually when a timer fires it will be quickly replaced by a new timer, and the
+overall count of timers will remain roughly stable.
-These benchmarks use very simple implementations for a pure-ruby heap and an
-array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
-comparison, an alternate implementation `Array#min` and `Array#delete_at` is
-also shown.
+![bar graph for push_pop benchmarks](./images/push_pop.png)
-Three different scenarios are measured:
- * push N values but never pop (clearing between each set of pushes).
- * push N values and then pop N values.
- Although this could be used for heap sort, we're unlikely to choose heap sort
- over Ruby's quick sort implementation. I'm using this scenario to represent
- the amortized cost of creating a heap and (eventually) draining it.
- * For a heap of size N, repeatedly push and pop while keeping a stable size.
- This is a _very simple_ approximation for how most scheduler/timer heaps
- would be used. Usually when a timer fires it will be quickly replaced by a
- new timer, and the overall count of timers will remain roughly stable.
+### numbers
-In these benchmarks, `DHeap` runs faster than all other implementations for
-every scenario and every value of N, although the difference is much more
-noticable at higher values of N. The pure ruby heap implementation is
-competitive for `push` alone at every value of N, but is significantly slower
-than bsearch + insert for push + pop until N is _very_ large (somewhere between
-10k and 100k)!
+Even for very small N values the benchmark implementations, `DHeap` runs faster
+than the other implementations for each scenario, although the difference is
+still relatively small. The pure ruby binary heap is 2x or more slower than
+bsearch + insert for common push/pop scenario.
-For very small N values the benchmark implementations, `DHeap` runs faster than
-the other implementations for each scenario, although the difference is still
-relatively small. The pure ruby binary heap is 2x or more slower than bsearch +
-insert for common common push/pop scenario.
-
== push N (N=5) ==========================================================
push N (c_dheap): 1969700.7 i/s
push N (c++ stl): 1049738.1 i/s - 1.88x slower
push N (rb_heap): 928435.2 i/s - 2.12x slower
push N (bsearch): 921060.0 i/s - 2.14x slower
@@ -339,82 +311,72 @@
queue size = 1000000: 4403494.1 i/s - 1.66x slower
queue size = 2500000: 3304088.2 i/s - 2.21x slower
queue size = 5000000: 2664897.7 i/s - 2.74x slower
queue size = 10000000: 2137927.6 i/s - 3.42x slower
-## Profiling
+## Analysis
-_n.b. `Array#fetch` is reading the input data, external to heap operations.
-These benchmarks use integers for all scores, which enables significantly faster
-comparisons. If `a <=> b` were used instead, then the difference between push
-and pop would be much larger. And ruby's `Tracepoint` impacts these different
-implementations differently. So we can't use these profiler results for
-comparisons between implementations. A sampling profiler would be needed for
-more accurate relative measurements._
+### Time complexity
-It's informative to look at the `ruby-prof` results for a simple binary search +
-insert implementation, repeatedly pushing and popping to a large heap. In
-particular, even with 1000 members, the linear `Array#insert` is _still_ faster
-than the logarithmic `Array#bsearch_index`. At this scale, ruby comparisons are
-still (relatively) slow and `memcpy` is (relatively) quite fast!
+There are two fundamental heap operations: sift-up (used by push) and sift-down
+(used by pop).
- %self total self wait child calls name location
- 34.79 2.222 2.222 0.000 0.000 1000000 Array#insert
- 32.59 2.081 2.081 0.000 0.000 1000000 Array#bsearch_index
- 12.84 6.386 0.820 0.000 5.566 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
- 10.38 4.966 0.663 0.000 4.303 1000000 DHeap::Benchmarks::BinarySearchAndInsert#<< d_heap/benchmarks/implementations.rb:61
- 5.38 0.468 0.343 0.000 0.125 1000000 DHeap::Benchmarks::BinarySearchAndInsert#pop d_heap/benchmarks/implementations.rb:70
- 2.06 0.132 0.132 0.000 0.000 1000000 Array#fetch
- 1.95 0.125 0.125 0.000 0.000 1000000 Array#pop
+* A _d_-ary heap will have `log n / log d` layers, so both sift operations can
+ perform as many as `log n / log d` writes, when a member sifts the entire
+ length of the tree.
+* Sift-up makes one comparison per layer, so push runs in `O(log n / log d)`.
+* Sift-down makes d comparions per layer, so pop runs in `O(d log n / log d)`.
-Contrast this with a simplistic pure-ruby implementation of a binary heap:
+So, in the simplest case of running balanced push/pop while maintaining the same
+heap size, `(1 + d) log n / log d` comparisons are made. In the worst case,
+when every sift traverses every layer of the tree, `d=4` requires the fewest
+comparisons for combined insert and delete:
- %self total self wait child calls name location
- 48.52 8.487 8.118 0.000 0.369 1000000 DHeap::Benchmarks::NaiveBinaryHeap#pop d_heap/benchmarks/implementations.rb:96
- 42.94 7.310 7.184 0.000 0.126 1000000 DHeap::Benchmarks::NaiveBinaryHeap#<< d_heap/benchmarks/implementations.rb:80
- 4.80 16.732 0.803 0.000 15.929 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
+* (1 + 2) lg n / lg d ≈ 4.328085 lg n
+* (1 + 3) lg n / lg d ≈ 3.640957 lg n
+* (1 + 4) lg n / lg d ≈ 3.606738 lg n
+* (1 + 5) lg n / lg d ≈ 3.728010 lg n
+* (1 + 6) lg n / lg d ≈ 3.906774 lg n
+* (1 + 7) lg n / lg d ≈ 4.111187 lg n
+* (1 + 8) lg n / lg d ≈ 4.328085 lg n
+* (1 + 9) lg n / lg d ≈ 4.551196 lg n
+* (1 + 10) lg n / lg d ≈ 4.777239 lg n
+* etc...
-You can see that it spends almost more time in pop than it does in push. That
-is expected behavior for a heap: although both are O(log n), pop is
-significantly more complex, and has _d_ comparisons per layer.
+See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
-And `DHeap` shows a similar comparison between push and pop, although it spends
-half of its time in the benchmark code (which is written in ruby):
+### Space complexity
- %self total self wait child calls name location
- 43.09 1.685 0.726 0.000 0.959 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
- 26.05 0.439 0.439 0.000 0.000 1000000 DHeap#<<
- 23.57 0.397 0.397 0.000 0.000 1000000 DHeap#pop
- 7.29 0.123 0.123 0.000 0.000 1000000 Array#fetch
+Space usage is linear, regardless of d. However higher d values may
+provide better cache locality. Because the heap is a complete binary tree, the
+elements can be stored in an array, without the need for tree or list pointers.
-### Timers
+Ruby can compare Numeric values _much_ faster than other ruby objects, even if
+those objects simply delegate comparison to internal Numeric values. And it is
+often useful to use external scores for otherwise uncomparable values. So
+`DHeap` uses twice as many entries (one for score and one for value)
+as an array which only stores values.
-Additionally, when used to sort timers, we can reasonably assume that:
- * New timers usually sort after most existing timers.
- * Most timers will be canceled before executing.
- * Canceled timers usually sort after most existing timers.
+## Thread safety
-So, if we are able to delete an item without searching for it, by keeping a map
-of positions within the heap, most timers can be inserted and deleted in O(1)
-time. Canceling a non-leaf timer can be further optimized by marking it as
-canceled without immediately removing it from the heap. If the timer is
-rescheduled before we garbage collect, adjusting its position will usually be
-faster than a delete and re-insert.
+`DHeap` is _not_ thread-safe, so concurrent access from multiple threads need to
+take precautions such as locking access behind a mutex.
## Alternative data structures
As always, you should run benchmarks with your expected scenarios to determine
-which is right.
+which is best for your application.
-Depending on what you're doing, maintaining a sorted `Array` using
-`#bsearch_index` and `#insert` might be just fine! As discussed above, although
-it is `O(n)` for insertions, `memcpy` is so fast on modern hardware that this
-may not matter. Also, if you can arrange for insertions to occur near the end
-of the array, that could significantly reduce the `memcpy` overhead even more.
+Depending on your use-case, maintaining a sorted `Array` using `#bsearch_index`
+and `#insert` might be just fine! Even `min` plus `delete` with an unsorted
+array can be very fast on small queues. Although insertions run with `O(n)`,
+`memcpy` is so fast on modern hardware that your dataset might not be large
+enough for it to matter.
-More complex heap varients, e.g. [Fibonacci heap], can allow heaps to be merged
-as well as lower amortized time.
+More complex heap varients, e.g. [Fibonacci heap], allow heaps to be split and
+merged which gives some graph algorithms a lower amortized time complexity. But
+in practice, _d_-ary heaps have much lower overhead and often run faster.
[Fibonacci heap]: https://en.wikipedia.org/wiki/Fibonacci_heap
If it is important to be able to quickly enumerate the set or find the ranking
of values in it, then you may want to use a self-balancing binary search tree
@@ -430,27 +392,19 @@
[timing wheels]: http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf
## TODOs...
-_TODO:_ Also ~~included is~~ _will include_ `DHeap::Set`, which augments the
-basic heap with an internal `Hash`, which maps a set of values to scores.
-loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
-and adjustments, to achieve faster average runtime for *add* and *cancel*
-operations.
+_TODO:_ Also ~~included is~~ _will include_ `DHeap::Map`, which augments the
+basic heap with an internal `Hash`, which maps objects to their position in the
+heap. This enforces a uniqueness constraint on items on the heap, and also
+allows items to be more efficiently deleted or adjusted. However maintaining
+the hash does lead to a small drop in normal `#push` and `#pop` performance.
_TODO:_ Also ~~included is~~ _will include_ `DHeap::Lazy`, which contains some
features that are loosely inspired by go's timers. e.g: It lazily sifts its
heap after deletion and adjustments, to achieve faster average runtime for *add*
and *cancel* operations.
-
-Additionally, I was inspired by reading go's "timer.go" implementation to
-experiment with a 4-ary heap instead of the traditional binary heap. In the
-case of timers, new timers are usually scheduled to run after most of the
-existing timers. And timers are usually canceled before they have a chance to
-run. While a binary heap holds 50% of its elements in its last layer, 75% of a
-4-ary heap will have no children. That diminishes the extra comparison overhead
-during sift-down.
## Development
After checking out the repo, run `bin/setup` to install dependencies. Then, run
`rake spec` to run the tests. You can also run `bin/console` for an interactive