README.md in interval_notation-0.1.3 vs README.md in interval_notation-0.2.0

- old
+ new

@@ -44,25 +44,30 @@ interval_5_2 = oo(-Float::INFINITY, 7) # => (-∞,7) interval_6_2 = oc(-Float::INFINITY, -3) # => (-∞,-3] interval_7_2 = oo(-3, Float::INFINITY) # => (-3,+∞) interval_8_2 = co(5.5, Float::INFINITY) # => [5.5,+∞) # Create interval set from string (see IntervalSet.from_string for details) -interval_9 = int('{0}U[1,5)U(5,infty)') # => {0}U[1,5)U(5,+∞) +interval_9 = interval('{0}U[1,5)U(5,infty)') # => {0}U[1,5)U(5,+∞) ``` If you prefer more descriptive method names, use `IntervalNotation::Syntax::Long`. In such case you'll have `open_open`, `open_closed`, `closed_open`, `closed_closed`, `less_than`, `less_than_or_equal_to`, `greater_than`, `greater_than_or_equal_to` and `point` methods. `interval` is a long-form analog for `int` - to create interval set from string -Consider that no one class is supposed to be used directly! For further details see section **Internal structure**. +Sometimes you want to create one of `BasicInterval`s which are underlying structures of `IntervalSet`s. In that case you can use similar constructors with `_basic` suffix like `oo_basic(from, to)` and so on. +Some constructors create not necessary contiguous intervals like `interval(String)` do. Such constructors do not have `*_basic` counterpart. +Note that `BasicInterval` classes are not supposed to be used for interval operations directly! For further details see section **Internal structure**. + ### Interval operations Intervals can be combined in many different ways: ```ruby include IntervalNotation::Syntax::Long a = open_closed(0,15) # => (0,15] b = closed_open(10,25) # => [10,25) c = point(-5) # => {-5} +d = closed_closed(-200, -100) # => [-200,-100] bc = b | c # => {-5}∪[10,25) +bcd = b | c | d # => [-200,-100]∪{-5}∪[10,25) # Union of a pair of intervals: a | b # => (0,25) a.union(b) # ditto @@ -85,21 +90,32 @@ # Interval closure bc.closure # => {-5}∪[10,25] # Covering interval bc.covering_interval # => [-5,25) + +# Return connected components (contiguous non-adjacent intervals) +# comprising `IntervalSet` (and wrapped in `IntervalSet`s) +bc.connected_components # => [c, b] + +# Return connected components (contiguous non-adjacent intervals) +# comprising `IntervalSet` (in `BasicInterval` representation; not wrapped) +bc.intervals # => [point_basic(-5), closed_open_basic(10,25)] + +# Find connected component (BasicInterval representation) which covers a point +bc.interval_covering_point(12) # => [10,25) ``` If you want to combine more than two intervals, you can perform several consequent operations: ```ruby a | b | c # => {-5}∪(0,25) a & b & c # => ∅ # or may be [a,b,c].inject(&:|) # => {-5}∪(0,25) [a,b,c].inject(&:&) # => ∅ ``` -But there is a much better and faster way to unite or intersect multiple intervals: +But there is a much faster way to unite or intersect multiple intervals: ```ruby IntervalNotation::Operations.union([a,b,c]) # => {-5}∪(0,25] IntervalNotation::Operations.intersection([a,b,c]) # => ∅ ``` If you unite thousands or millions of intervals, you definitely should choose the last method! Do not try to inject intervals one by one for the sake of perfomance. Running time can differ dramatically (seconds vs hours for union of hundreds of thousands intervals). @@ -139,16 +155,88 @@ interval_set.intersect?(interval_set) interval_set.contain?(interval_set) interval_set.contained_by?(interval_set) ``` +### Segmentation +Another essential structure in a library is a `Segmentation`. Segmentation is a partitioning of number line into adjacent non-ovelapping intervals (covering entire R) such that each segment has its own state. Adjacent segments with the same state are glued. +Segmentation can be used to trace lots of interval sets simultaneously. It's made with help of `Tagging` trace-states. `SingleTagging` and `MultiTagging` allow one to mark each interval set with its own tag and to partition number line into regions with certain tag sets. + +```ruby +intervals_tagged = {oo(0,10) => :A, cc(0,8) => :B, oo(5,15) => :C} +intervals_tagged_dup_tags = {oo(0,10) => :A, cc(0,8) => :B, oo(5,15) => :A} + +### Usual tagging returns segments with states which are `Set`s of tags: +IntervalNotation::SweepLine.make_tagging(intervals_tagged) +# => Segmentation: [<(-∞;0): {}>, <{0}: {B}>, <(0;5]: {A, B}>, <(5;8]: {A, B, C}>, <(8;10): {A, C}>, <[10;15): {C}>, <[15;+∞): {}>] + +IntervalNotation::SweepLine.make_tagging(intervals_tagged_dup_tags) +# => Segmentation: [<(-∞;0): {}>, <{0}: {B}>, <(0;8]: {A, B}>, <(8;15): {A}>, <[15;+∞): {}>] + +### Multitagging returns segments with states which are `Hash`es of tag counts: +IntervalNotation::SweepLine.make_multitagging(intervals_tagged) +# => Segmentation: [<(-∞;0): {}>, <{0}: {:B=>1}>, <(0;5]: {:A=>1, :B=>1}>, <(5;8]: {:A=>1, :B=>1, :C=>1}>, <(8;10): {:A=>1, :C=>1}>, <[10;15): {:C=>1}>, <[15;+∞): {}>] + +IntervalNotation::SweepLine.make_multitagging(intervals_tagged_dup_tags) +# => Segmentation: [<(-∞;0): {}>, <{0}: {:B=>1}>, <(0;5]: {:A=>1, :B=>1}>, <(5;8]: {:A=>2, :B=>1}>, <(8;10): {:A=>2}>, <[10;15): {:A=>1}>, <[15;+∞): {}>] +``` + +One can create different segmentation states using `SweepLine.make_segmentation` with custom trace-state objects. Operations like union and intersection are made this way, using special trace-states which return true when sweep line intersect any/all of intervals. Trace-state is a special object which can recalculate state when interval boundaries were hit. See `SweepLine::TraceState` module for details. + +`Segmentation` have some methods to transform states, query segments and so on: + +```ruby +include IntervalNotation +include IntervalNotation::Syntax::Short +segmentation = Segmentation.new([ + Segmentation::Segment.new( lt_basic(0), Set.new ), + Segmentation::Segment.new( pt_basic(0), Set.new([:B]) ), + Segmentation::Segment.new( oc_basic(0,8), Set.new([:A,:B]) ), + Segmentation::Segment.new( oo_basic(8,15), Set.new([:A]) ), + Segmentation::Segment.new( ge_basic(15), Set.new ), +]) +# Segmentation: [<(-∞;0): {}>, <{0}: {B}>, <(0;8]: {A, B}>, <(8;15): {A}>, <[15;+∞): {}>] + +### `Segmentation#map_state` transforms state of each segment. +### If necessary new segments will be glued +segmentation.map_state{|segment| segment.state.size } +# => Segmentation: [<(-∞;0): 0>, <{0}: 1>, <(0;8]: 2>, <(8;15): 1>, <[15;+∞): 0>] + +segmentation.map_state{|segment| segment.state.size > 1 } +# => Segmentation: [<(-∞;0]: false>, <(0;8]: true>, <(8;+∞): false>] + +### `Segmentation#boolean_segmentation` transforms state of each segment into +### boolean value (useful to glue segments which are truthy/falsy but +### have not exactly equal state). Same result can be obtained by `#map_state`. +segmentation.boolean_segmentation{|segment| segment.state.size > 1 } +# => Segmentation: [<(-∞;0]: false>, <(0;8]: true>, <(8;+∞): false>] + + +### `IntervalSet` and true/false `Segmentation` can be converted +### to each other. +### Use `IntervalSet#make_segmentation` and `Segmentation#make_interval_set`. +bool_segmentation = segmentation.boolean_segmentation{|segment| + segment.state.size > 1 +} +bool_segmentation.make_interval_set +# => (0,8] + +(oo(1,3) | pt(5)).make_segmentation +# => Segmentation: [<(-∞;1]: false>, <(1;3): true>, <[3;5): false>, <{5}: true>, <(5;+∞): false>] + +### `Segmentation#segment_covering_point` returns a `Segment` +### which lies against specified point +segmentation.segment_covering_point(10) +# => <(8;15): {A}> +``` + ## Internal structure `IntervalNotation::IntervalSet` is designed in order to keep ordered list of non-overlapping intervals and represent 1-D point set. Each interval in the `IntervalSet` is an instance of one of following classes: `Point`, `OpenOpenInterval`, `OpenClosedInterval`, `ClosedOpenInterval` or `ClosedOpenInterval` representing contiguous 1-D subsets. One can find them in `IntervalNotation::BasicIntervals` module. None of these classes is intended to be directly instantiated, usually intervals are constructed using factory methods and combining operations. All factory methods listed above create `IntervalSet`s, wrapping an instance of corresponding interval or point class. All interval set operations create new `IntervalSet`s, even if they contain the only basic interval. -`IntervalSet`s are value objects. Once instantiated they cannot be changed, all operations just create new objects. It also means, you can fearlessly use them as key values in hashes. +`IntervalSet`s and `Segmentation`s (as well as `BasicInterval`s) are value objects. Once instantiated they cannot be changed, all operations just create new objects. It also means, you can fearlessly use them as key values in hashes. Combining of intervals is made by sweep line method, so is linear by number of intervals. Many querying operations (such as `#intersect`) rely on combining intervals thus also have linear complexity. Some of these perfomance drawbacks will be fixed in future. Query `#include_position?` is made by binary search (so has logarithmic complexity). `IntervalSet` has two constructors: `.new` and `.new_unsafe`. Use them with caution. Default constructor accepts data in a specially prepared format, while unsafe constructor makes no validation at all (and can create inconsistent obect). But `.new_unsafe` still can be useful when you are absolutely sure, provided data is ok, and a few milliseconds for unnecessary validation make sense.