# Setfu ##Purpose: Creates a BitSet class with methods that allow you to construct and opperate on set instances. A 'Bignum' instance is used internally to store each element of the set. This allows very fast operations when comparing two set instances. Member elements can be represented as positive small integers, or characters. Sets can be constructed with integers, strings, characters, ranges, and arrays. When characters are used, the ordinate value represents the element bit of the internal Bignum. Additional extensions are made to: String, Array, and Range; these extensions implement most of the basic set operators such as intersection and union. The bits of an internal Bignum instance is used to store the members of the set of non-negative integers. Rather than using a long-winded series of comparison operators, a set can operate very efficiently in a single step. Note that only non-negative integers are possible members of a BitSet instance. This differs from the Set object defined by rails which is more or less an unordered Array of any object. The BitSet object implements a purer mathematical classical set, and permits faster operations on its members. With members constrained to non-negative integers, many more constructs and operations are possible. ##Documentation ###Construction There are many ways to create a BitSet instance. Anything that can be converted to a list of positive Integers can be used in the `#new` method. Also, many types can call `#to_bset` as an alternate means to creating a BitSet instance. Additionally, there are several binary operators which were previously undefined for certain types that will now return a BitSet instance. BitSet class methods have additional generators as well. This is best seen by example as follows: ```ruby require "setfu" # create empty set bs = BitSet.new bs.empty? # true bs.to_a # [] # initialize using a list of positive integers aaa = BitSet.new 15,27,92,103 aaa.to_a # [15, 27, 92, 103] # initialize using strings aaa = BitSet.new "Each ord of this string is a set member!" aaa.to_ra(false) # [" ", "!", "E", "a".."i", "m".."o", "r".."t"] # initialize using ranges aaa = BitSet.new 0..15, 'a'..'z' aaa.to_ra(true) # [0..15, 97..122] # initialize using arrays aaa = BitSet.new [1,2,3],4,5,6,[[[7,8,9]]], "#" aaa.to_ra # [1..9, 35] # converting strings to BitSet instances aaa = "This string will become a new set with the #to_bset method".to_bset aaa.to_ra(false) # [" ", "#", "T", "_", "a".."e", "g".."i", "l".."o", "r".."t", "w"] # converting ranges to BitSet instances aaa = (400...425).to_bset aaa.to_ra # [400..424] # converting arrays to BitSet instances aaa = [5,6,7,8,55..60, "~"].to_bset aaa.to_ra # [5..8, 55..60, 126] ``` ###Operators BitSet instances support the classical set operators of union, exclusive union, and intersection There are also membership tests for subset and proper set as well as set intersection. Several standard data types do not define some of these operators and are instead defined by BitSet. Array types do define union and intersection but not exclusive union; as there is utility in having an Array exclusive union; `setfu` defines this as an Array method. When an Array performs an exclusive union with some other type, a BitSet type is created. This is best seen by example as follows: ```ruby require "setfu" # exclusive union ^ aaa = ('a'..'z') ^ "Cat in the hat" aaa.to_ra(false) # [" ", "C", "b".."d", "f", "g", "j".."m", "o".."s", "u".."z"] bbb = [1,3,5] ^ [3,6,9] # returns Array type [1, 5, 6, 9] bbb = [1,3,5] ^ [3,6,9].to_bset # BitSet type ... bbb.to_a == [1, 5, 6, 9] # union | aaa = [1,3,5] | [3,6,9] # returns Array type == [1, 3, 5, 6, 9] aaa = [1,3,5] | [3,6,9].to_bset # returns BitSet type == [1, 3, 5, 6, 9] aaa = "abcd" | "defg" # aaa.to_s == "abcdefg" # intersection & aaa = [1,2,3,5] & [2,3,4,5,6] # returns Array type == [2,3,5] aaa = [1,2,3,5].to_bset & [2,3,4,5,6] # returns BitSet type == [2,3,5] aaa = "abcdefgh" & "defghijkl" # aaa.to_s == "defgh" # set difference - aaa = [1,2,3,4,5,6,7,8,9] - [2,4,6,8] # Array type == [1,3,5,7,9] aaa = (1..9) - [2,4,6,8] # BitSet type == [1,3,5,7,9] # set inversion ~ # note: we must define the bounds of the set before we invert it. aaa = BitSet.new "a".."z", "A".."Z", "0".."9" aaa.entropy = 256 # ASCII-8BIT range of 256 possible elements bbb = ~aaa bbb.to_ra # [0..47, 58..64, 91..96, 123..255] # subset <= ... a <= b ... true if `a` is a subset of `b` aaa = [3,4,5,6,7] [3] <= aaa # true [3,6,7] <= aaa # true [1,5,6] <= aaa # false # proper subset < ... a < b ... true if `a` is a subset of `b` and `a` != `b` aaa = [3,4,5,6,7] [3] < aaa # true [3,6,7] < aaa # true aaa < aaa # false # intersection test ** ... a ** b ... true if intersection is not empty [1,2,3,4,5] ** [4,8] # true "team" ** "i" # false ... there is no "i" in team! # equality and inequality == != (4..7).to_bset == [4,5,6,7] # true ... Note: left-most must be BitSet type (4..7).to_bset != [4,5,6,7] # false ``` ###Member Access BitSet defines array access operators `:[]` and `:[]=` as it it were an array of bits. If you pass an Integer type, you will get `true` (if member of set) or `false`. Anything else will return a BitSet instance. You can also set or clear members of the set as well. See the example below: ```ruby require "setfu" aaa = BitSet.new 0..9 aaa[4] # true aaa[4] = false aaa.to_ra # [0..3, 5..9] aaa[2..8] # BitSet ... [2, 3, 5..8] aaa[15..20] = true # BitSet ... [0..3, 5..9, 15..20] aaa[1,2,3,4,5,6,7] # BitSet ... [1..3, 5..7] aaa[3..17] = false # BitSet ... [0..2, 18..20] ``` ###Instance Methods #### bitset.count This returns number of elements in the BitSet instance. This method is computationally slow as each and every bit of the internal `Bignum` must be checked. See below: ```ruby primes = [2,3,5,7,11,13,17,19,23,29,31,37,41].to_bset primes.count # 13 ``` #### bitset.each, bitset.each_member, bitset.reverse_each These methods when called with a block will yield each member of the set as index number values. These can also be called without a block returning an Enumerator object. The `#each_member` method is identical to `#each` which yields lower index numbers first. #### bitset.add(item), bitset.add!(item) The `#add(item)` method creates a new instance whose members comprise self and the union of the item in the parameter list. This is equivalent to `bitset_instance | [item]`. The bang version `#add!(item)` self-modifies. The parameter `item` can be any of the following: Integer, Range, BitSet, or Array. Arrays can include all of the former items any level deep. Note that this method is called by the main constructor which passes an array to this method. Recursion is used to unravel the Array. #### bitset.remove(item), bitset.remove!(item) These methods remove elements from the BitSet instance if they exist. #### bitset.include?(item) Returns true if p is a subset. Parameter `p` can be: BitSet, Range, String, Array, or Integer. #### bitset.empty? Returns true if there are no elements in the BitSet instance. #### bitset.zap! This removes all elements from the BitSet instance making it empty. Self-modifies. #### bitset.dup This create and returns a duplicate copy of the BitSet instance. #### bitset.replace(item) This replaces the inner content of the current BitSet instance from the item passed to this method. The item will first be converted to a BitSet instance, then its internal contents will transfer. #### bitset.to_i This returns the internal Integer representing the set. #### bitset.to_s This returns a string representing the elements of the set. This routine is computationally slow as each bit must be examined. #### bitset.to_a(int=true) This returns an array of every element in the set in the form of integers. If the parameter int is set to false, the array is filled with character representations of the number whose ordinal value is that of the integer. This routine is computationally slow as each bit must be examined. #### bitset.to_ra(int=true, th=3) This returns as array filled with either integers or ranges of integers. The threshold parameter decides the minimum range size that will be used. The minimum value is 2. The `int` parameter when set to false will return string characters and string ranges. This routine is computationally slow as each bit must be examined. #### bitset.set_bits!(n) This is the inverse of `#to_i` where the internal Integer gets replaced. Note that you should recalculate the entropy as this defines the bounds of the set. This is only necessary if you need to invert the set elements later on. #### bitset.recalculate_entropy! This routine should be called after a `#set_bits!` where you do not know what the entropy value should be. #### bitset.entropy, bitset.entropy=(n) This returns the current entropy number of the set. You can also set this property. Note that this number can only get bigger; like the universe entropy can only increase. #### bitset.entropy_2n! This sets the entropy number to the nearest 2**n that will contain the highest set element. #### bitset.first(int=true), bitset.first!(int=true) This returns the first element of the set, or nil if the set is empty. The first element can be converted to a character if the parameter `int` is set to false. The bang version removes and returns the first element of the set. The value `nil` is always returned if the set is empty. #### bitset.last(int=true), bitset.last!(int=true) This returns the last element of the set, or nil if the set is empty. The last element can be converted to a character if the parameter `int` is set to false. The bang version removes and returns the last element of the set. The value `nil` is always returned if the set is empty. #### bitset.odd, bitset.even This returns the odd or even elements of the set. #### #### bitset.odd!, bitset.even! The method `#odd!` retains odd elements and returns the even elements. The method `#even!` retains even elements and returns the odd elements. #### #### bitset.odd?, bitset.even? This returns true if all elements in the set are odd or even. Note that empty sets are neither even or odd. #### bitset.min, bitset.max This returns the smallest or largest element of the set, or nil if the set is empty. This is the same as `#first` and `#last` respectfully. #### bitset.max_run This returns the largest count of consecutive members. #### bitset.runs This returns a new set devoid of isolated elements. #### bitset.runs?(n=2) This returns true if the set includes a sequence of runs with a count greater than or equal to n. #### bitset.singles Returns a new set of only isolated elements. #### object.to_bset Method exported to Array, String, and Range. Returns self if BitSet type. This creates a new BitSet instance. #### bitset.inc(n=1), bitset.dec(n=1), bitset.inc!(n=1), bitset.dec!(n=1) These methods increment or decrement each element of the set. Internally, shifting occurs. The parameter `n` is the increment (shift) amount. The bang versions self-modify, while the non-bang versions return a new set. Decrementing has the possibility of losing bits, thereby making it non-reversible. #### bitset.add_primes(n=100), bitset.add_primes!(n=100) This is a generator method that adds prime numbers to the current set or creates a new set combining self with generated prime numbers. #### bitset.add_parse_chars, bitset.add_parse_chars! Parse chars comprise the ASCII-7bit characters that are either white space, or non-alpha-numeric characters. These combine with self to form a new set. The bang version self-modifies. #### bitset.add_lowercase_chars, bitset.add_lowercase_chars! This returns the set of ASCII lowercase characters `'a'..'z'` combined (union) with the current BitSet instance. The bang version self-modifies. #### bitset.add_uppercase_chars, bitset.add_uppercase_chars! This returns the set of ASCII uppercase characters `'A'..'Z'` combined (union) with the current BitSet instance. The bang version self-modifies. #### bitset.add_letter_chars, bitset.add_letter_chars! This returns the set of ASCII letter characters `'a'..'z', 'A'..'Z'` combined (union) with the current BitSet instance. The bang version self-modifies. #### bitset.add_digit_chars, bitset.add_digit_chars! This returns the set of ASCII number characters `'0'..'9'` combined (union) with the current BitSet instance. The bang version self-modifies. #### bitset.add_lowercase_utf, bitset.add_lowercase_utf! This returns the set of UTF lowercase Latin and Greek non-ASCII lowercase characters combined (union) with the current BitSet instance. The bang version self-modifies. You can chain this with `#add_lowercase_chars` to get all lowercase characters. Other UTF characters of other languages are not included. #### bitset.add_uppercase_utf, bitset.add_uppercase_utf! This returns the set of UTF uppercase Latin and Greek non-ASCII lowercase characters combined (union) with the current BitSet instance. The bang version self-modifies. You can chain this with `#add_uppercase_chars` to get all uppercase characters. Other UTF characters of other languages are not included. #### bitset.add_mixcase_utf, bitset.add_mixcase_utf! There are a few UTF characters that combine two letters, starting with a uppercase and ending with a lowercase. This set of characters are combined (union) with the current BitSet instance. The bang version self-modifies. Other UTF characters of other languages are not included. #### bitset.add_dual_lower_utf, bitset.add_dual_lower_utf! This returns the set of UTF lowercase Latin and Green non-ASCII characters that combine two consecutive characters into one. This set of characters are combined (union) with the current BitSet instance. The bang version self-modifies. Other UTF characters of other languages are not included. #### bitset.add_dual_upper_utf, bitset.add_dual_upper_utf! This returns the set of UTF uppercase Latin and Green non-ASCII characters that combine two consecutive characters into one. This set of characters are combined (union) with the current BitSet instance. The bang version self-modifies. Other UTF characters of other languages are not included. #### bitset.add_opposing_case, bitset.add_opposing_case! This routine looks through the set and wherever there appears a letter character, both upper and lower case versions of that character will be combined (union) with the set. The bang version updates the current set, while the non-bang version returns a new set. This works within the ASCII range only. #### bitset.add_opposing_utf_case, bitset.add_opposing_utf_case! This routine (a tad slower) looks at all letter characters, both ASCII and UTF, then adds the case changed letter to the set. The bang version updates the current set, while the non-bang version returns a new set. #### bitset.rand(n, return_type= :bset), bitset.rand!(n, return_type= :set) This routine selects `n` random elements of the set in order to form a subset. The bang version (`rand!`) retains the `n` random elements, and returns the remainder thereby splitting the set in two. Passing a value of zero returns all elements of the set which is meaningless if the return type is a BitSet type; this is because there is no implied order of a set, so this simply returns the set. There is however meaning if we set the return type to something else. The return type defaults to a BitSet type; or we can specify a different return type by setting the `return_type` parameter to one of the following symbols as follows:
```ruby :array # returns an array of Integers :array_chars # returns an array of String characters :string # returns a String :bset # returns a new BitSet instance ``` #### bitset.split, bitset.split! This bisects the set into two roughly equally sized sets and returns an array of two sets. The first set contains the lower elements, while the second set contains the higher elements. The bang version retaines the lower elements are returns the higher elements. ###Class Methods ### BitSet.fill(num_elements, start=0) This constructs a BitSet instance with contiguous values set to the parameter `num_elements`. The parameter `start` specifies the starting position of lowest element. As an example, we can create a set of lowercase ASCII letters as follows: ```ruby aa = BitSet.fill(26,'a') # same as ['a'..'z'].to_bset ``` #### BitSet.uppercase_chars, BitSet.lowercase_chars, BitSet.letter_chars These methods create sets from the standard ASCII character set. These are as follows: 'A'..'Z', 'a'..'z', and the combination of the former respectfully. #### BitSet.digit_chars These method creates a set from the standard ASCII character set representing '0'..'9' inclusive. #### BitSet.parse_chars This is the set of ASCII characters that are typically used as language tokens. Pretty much everything that is not a number character or a letter character; this set also includes the space character and all control characters. #### BitSet.uppercase_utf_chars(return_set=true) This creates a set comprising of UTF Latin and Greek characters that are considered uppercase. This set does not include ASCII characters, nor does it include dual (two letters in one glyph) characters. If the parameter is set to false, a string is instead returned. #### BitSet.lowercase_utf_chars(return_set=true) This creates a set comprising of UTF Latin and Greek characters that are considered lowercase. This set does not include ASCII characters, nor does it include dual (two letters in one glyph) characters. If the parameter is set to false, a string is instead returned. #### BitSet.dual_uppercase_utf_chars(return_set=true) This creates a set comprising of UTF Latin and Greek characters that are dual (two letters in one glyph) uppercase characters. This set does not include ASCII characters. If the parameter is set to false, a string is instead returned. #### BitSet.dual_lowercase_utf_chars(return_set=true) This creates a set comprising of UTF Latin and Greek characters that are dual (two letters in one glyph) lowercase characters. This set does not include ASCII characters. If the parameter is set to false, a string is instead returned. #### BitSet.mixcase_utf_chars(return_set=true) This creates a set comprising of UTF Latin and Greek characters that are dual (two letters in one glyph) where the two combined characters have both an uppercase character and a lowercase character. This set does not include ASCII characters. If the parameter is set to false, a string is instead returned. #### BitSet.get_utf_case_pairs(char=true) This returns the hash that pairs uppercase utf characters with their lowercase counterparts. The internal hash is stored using integer keys and values. When this method is called with the parameter set to true, a new hash is created with string keys and values; otherwise the internal hash is returned. This internal hash is used on the instance method `#add_opposing_utf_case`. #### BitSet.zap_utf_case_pairs This method creates a blank slate on the internal hash that contains the utf case pairs. #### BitSet.default_utf_case_pairs This method restores the internal hash that contains the utf case pairs to its default state. #### BitSet.rm_utf_case_pairs(obj) This method removes specific keys from the internal hash that contains the utf case pairs to its default state. You can pass a string or an array that represents the keys that you wish to remove. You should remove both the key and its value as the value is also a key in the hash. #### BitSet.add_utf_case_pairs(str) This method adds additional key-value pairs to the utf case pairs hash. The string should be even as both key and value are reversible. #### BitSet.default_utf_sets This class method restores the following five utf sets: `::uppercase_utf_chars`, `::lowercase_utf_chars`, `::mixcase_utf_chars`, `::dual_uppercase_utf_chars`, `::dual_lowercase_utf_chars`. #### BitSet.zap_utf_sets This class method empties the following five utf sets: `::uppercase_utf_chars`, `::lowercase_utf_chars`, `::mixcase_utf_chars`, `::dual_uppercase_utf_chars`, `::dual_lowercase_utf_chars`. #### BitSet.modify_utf_sets(*prms) This class method modifies the following five utf sets: `::uppercase_utf_chars`, `::lowercase_utf_chars`, `::mixcase_utf_chars`, `::dual_uppercase_utf_chars`, `::dual_lowercase_utf_chars`. A list of parameters requires a command, a target, and a list of characters. This is best explained by example as follows: ```ruby # parameters: # :rm ... remove command # :add ... add command # :upper ... targets ::uppercase_utf_chars # :lower ... targets ::lowercase_utf_chars # :mix ... targets ::mixcase_utf_chars # :dual_upper ... targets ::dual_uppercase_utf_chars # :dual_lower ... targets ::dual_lowercase_utf_chars # example: BitSet.modify_utf_sets(:add, :dual_upper, 8482) # add trade mark as valid BitSet.modify_utf_sets(:rm, :dual_upper, 8482) # undo the above ``` ### Tuple Processing Tuples require a bit of explanation and some definitions. A `group` is a collection of sets, in this case, an Array of BitSet instances. A `tuple` is a set whereby it represents a `group` which holds the same number of sets as the order of the `tuple`; this group's union must be equal to the `tuple` set. Also, a `tuple` cannot be an empty set. The `order` of a `tuple` is the number of elements that it holds. One more rule, is that a `tuple group` cannot include lower ordered `tuple groups`. As most of the processing is performed upon a group of sets, these methods are implemented on the Array class. Now that you are totally confused, a few examples should help clear things up as follows: ```ruby bs = [5].to_bset BitSet.tuple? [bs] # true, group of 1 same as union of all s1 = [4,9].to_bset s2 = [4,9].to_bset s3 = [4,5].to_bset [s1,s2].tuple? # true, set of two same as: s1 | s2 [s1,s2,s3].tuple? # false, order 2 found in group [s2,s3].tuple? # false, order 3 tuple does not match 2 set items [s1,s2].tuple # returns BitSet[s1,s2] [s2,s3].tuple # returns nil, or invalid tuple [s1,s2,s3].tuple # returns nil, or invalid tuple s1 = [4,9].to_bset s2 = [5,9].to_bset s3 = [4,5].to_bset [s1,s2,s3]tuple? # true, order 3, and s1 | s2 | s3 == [4,5,9] [s1,s2,s3].tuple # returns BitSet [4,5,9] s1 = [7,8].to_bset s2 = [3,9].to_bset s3 = [3,7].to_bset s4 = [3,7,8,9].to_bset [s1,s2,s3,s4].tuple? # true, order 4, and s1 | s2 | s3 | s4 == [3,7,8,9] [s1,s2,s3,s4]tuple # returns BitSet [3,7,8,9] ``` As you can see from the example, higher ordered `tuples` exponentially grow in complexity. #### ary.tuples(group) This routine examines the group (array of sets) and collects tuples starting from lower order tuples first. These lower-ordered tuples are removed from consideration when locating higher-ordered tuples. Violations are tuples that have too many matches in a sub-group for that particular order. Incomplete tuples which would have been considered an error are ignored. This routine returns an array of two groups of sets. The first group is the collection of sub-tuples found in the group. The second array returns the tuple violations. See the example below: ```ruby group = [] group.push [1,2].to_bset group.push [7,8].to_bset group.push [4,5].to_bset group.push [3,4,5].to_bset group.push [3,5].to_bset group.push [3,4,5,7,8,9].to_bset group.push [7,8].to_bset group.push [1,2,5,6,9].to_bset group.push [1,2].to_bset rslt = group.tuples valid_tuples = rslt.first # finds 3 error_tuples = rslt.last # empty ... no violations valid_tuples[0].to_a # [1,2] valid_tuples[1].to_a # [7,8] valid_tuples[2].to_a # [3,4,5] ``` #### ary.reduce_tuples(pass_count=1) This routine calls BitSet.tuples class method and applies that knowledge to reducing each member of that group. If a group member contains a proper set of one of the found tuples, that member subtracts that particular tuple from its elements. This knowledge can be used to solve Sudoku puzzles. In fact, the rspec spec file includes a Sudoku puzzle that gets solved from this singular rule. The routine returns the number of reductions performed. The group is self-modified with the reduced sets. Using the same group from the previous example, we get the following result: ```ruby group.reduce_tuples(1) group[0].to_a # [1,2] group[1].to_a # [7,8] group[2].to_a # [4,5] group[3].to_a # [3,4,5] group[4].to_a # [3,5] group[5].to_a # [9] group[6].to_a # [7,8] group[7].to_a # [6,9] ... second pass [6] group[8].to_a # [1,2] ``` We notice from the first reduction another tuple [9] was created. This means another pass will further reduce the group. The first pass only uses the tuples it found in the first pass which were 3. The second pass will find 4 tuples which will further reduce the group. If you pass a large number on the second parameter, the routine will continually evaluate until no reduction occurs, or the `pass_count` counts down to zero. #### ary.members_to_bset This utility method converts elements of the array to BitSets instances. #### ary.untag_bset_elements During the reduction process some items in the array get tagged. This routine clears such tagging. This is a utility method called by the `#reduce_tuples` method. ###case subsumption operator === The `===` operator has special meaning in ruby. It is known as the case subsumption operator. It is used during the `case when` statements as its comparison means. The left side of the operator's type is the owner and defines what it means to have the right side passed to it. It is like a method call: `my_left_instance.tripple_equals(right_instance)`. You could call like this: `(1..3).===(2)` returning true. This is the same as: `(1..3) === 2`. Note that `Array` does not define this operator so the following expression returns false: `[1,2,3] === 2` It turns out if you don't define it, `Kernel` will define it for you which will look more or less like `==`. The default behavior for a BitSet instance is a test for equality; but this can be changed on an instance by instance basis. #### bitset.set_case(sym=:mode_equal) This instance method defines the meaning of the `===` operator for the current instance variable. There are six different modes of operation as follows: ```ruby # where `x` is the current BitSet instance variable, and `y` is a BitSet friendly type. # :mode_intersection ... (x===y) true if x ** y # :mode_equal ... (x===y) true if x == y # :mode_sub ... (x===y) true if y <= x # :mode_proper ... (x===y) true if y < x # :mode_super ... (x===y) true if x <= y # :mode_superproper ... (x===y) true if x < y x = ('a'..'z').to_bset x.set_case :mode_intersection x === "Fred" # true 'r', 'e', 'd' all intersect x === "RED" # false x.set_case :mode_proper x === "Fred" # false x === "fred" # true x === x # false, not a proper set x.set_case :mode_superproper x === x # false x === "Fred" # false , "Fred" is not a superset of x x === "Fred" | x # true ``` ## Revision History ### Version 3.0.0 * added tuple processing * added BitSet.fill construction method * updated utf methods * fixed range bugs * includes gems: primitive_wrapper, pred, betterobject, yieldhelper ### Older Versions See `bdlsys.com/gems/setfu` to get older documents. ## Installation Add this line to your application's Gemfile: gem 'setfu' And then execute: $ bundle Or install it yourself as: $ gem install setfu ## Contributing I need to control this for the time being