=begin rdoc == HashLab Methods used in experiments on hash tables. NOTE: the +insert+ and +lookup+ methods defined here are "stubs" used to test the hash functions. Load the file hashmethods.rb to overwrite the stubs with the actual methods that create and search buckets. =end module RubyLabs module HashLab TableView = Struct.new(:table, :nrows) =begin rdoc Compute the sum of the letters in a string, where each letter is weighted according to its position. Requires the +ord+ method added to Fixnum in the RubyLabs module. =end # :begin :radix26 def radix26(s) x = 0 s.each_byte do |b| x = x * 26 + b.ord end return x end # :end :radix26 =begin rdoc Trival hash function that uses only the first letter in the input string. +s+ is the string to hash, +n+ is the size of the hash table. =end # :begin :h0 def h0(s, n) return s[0].ord % n end # :end :h0 =begin rdoc Slightly better hash function that uses the first two letters in the input string. +s+ is the string to hash, +n+ is the size of the hash table. =end # :begin :h1 def h1(s, n) return ((s[0].ord * 26) + s[1].ord) % n end # :end :h1 =begin rdoc Hash function based on the radix-26 representation of the input string. +s+ is the string to hash, +n+ is the size of the hash table. =end # :begin :hr def hr(s, n) return radix26(s) % n end # :end :hr =begin rdoc Simple hash function used in the first project, to introduce the idea of hashing. Uses a default table size of 10 and the h0 method (which just looks at the first letter in a string). =end # :begin :h def h(s, n = 10) return s[0].ord % n end # :end :h =begin rdoc Method used in the first project: if +s+ can be added to +t+ (i.e. if no collision) return the location, otherwise return +nil+ =end # :begin :insert def insert(s, t) i = h(s, t.length) if t[i].nil? t[i] = s return i else return nil end end # :end :insert =begin rdoc Method used in the first project: if +s+ is in array +t+ return the location, otherwise return +nil+ =end # :begin :lookup def lookup(s, t) i = h(s, t.length) if t[i] == s return i else return nil end end # :end :lookup =begin rdoc Make a hash table with +n+ buckets. The optional parameter is a symbol specifying an alternative hash function (:h0 or :h1); the default is the method +hr+, which uses the radix26 function. The object returned by this method is an array of the designated size. A bit of Ruby magic defines three new methods just for the new object: [h=(f)] tells the table to use method f for the hash function (:h0, :h1, or nil) [h] returns the id of the hash function currently being used [hash(s)] call the current hash function on string s =end class HashTable @@hash_functions = [:h0, :h1, :hr] attr_reader :table def initialize(n, f = :hr) raise "HashTable: hash function must be one of #{@@hash_functions.inspect}" unless @@hash_functions.include?(f) @table = Array.new(n) @hash = f end # def hash(s) # return case @h # when :h0 : h0(s, @table.length) # when :h1 : h1(s, @table.length) # else hr(s, @table.length) # end # end def lookup(s) i = send(@hash, s, @table.length) return ( @table[i] && @table[i].include?(s) ) ? i : nil end def insert(s) i = send(@hash, s, @table.length) @table[i] = Array.new if @table[i].nil? @table[i] << s return i end def to_s print_table(self) end def inspect sprintf '#', @table.size, @hash.to_s end =begin rdoc Print usage statistics for the table. Prints lengths of longest and shortest buckets, number of empty buckets, and mean bucket length. =end def print_stats max = 0 min = Float::MAX nzero = 0 sum = 0 @table.each do |bucket| n = bucket.nil? ? 0 : bucket.length min = n if n < min max = n if n > max sum += n nzero += 1 if n == 0 end printf "shortest bucket: %d\n", min printf "longest bucket: %d\n", max printf "empty buckets: %d\n", nzero if max > 0 printf "mean bucket length: %.2f\n", sum.to_f / (@table.length - nzero) end return nil end =begin rdoc Return a list of indices for buckets that have more than +cutoff+ entries. =end def long_rows(cutoff = 5) rows = Array.new @table.each_with_index do |row, i| rows << i if !row.nil? && row.length > cutoff end return rows end end # HashTable =begin rdoc Print a nicely formatted representation of hash table. The parameter +t+ can be an array or a HashTable object. The optional parameter is the number of rows to print, e.g. to print just the first 10 rows of a large table +t+ call print_table(t,10). Skips rows that have empty buckets. =end def print_table(t, max = nil) t = t.table if t.class == HashTable max = t.length unless max max.times { |i| print_row(i, t[i] ) unless t[i].nil? } return nil end def print_row(n, row) # :nodoc: printf "%4d: %s\n", n, row.inspect end =begin rdoc Verification of numbers shown in the table for the birthday paradox. Call birthday(n,m) to make a table with n rows and fill it with m random words. Return true if any row has more than one item. =end def birthday(n, m) t = HashTable.new(n) TestArray.new(m, :words).each { |s| t.insert(s) } # puts t t.table.each { |row| return true if row && row.length > 1 } return false end =begin rdoc Initialize the canvas with a drawing of an hash table +t+ (a table is just an array augmented with a hash method) =end def view_table(t, userOptions = {} ) options = @@tableOptions.merge(userOptions) Canvas.init(300, 500, "HashLab") tbl = t.table x0 = options[:tableX] x1 = x0 + options[:cellWidth] # if nil, color = gray, else color = white and draw text # todo -- arrow # todo -- save text, update with call to insert nrows = min(tbl.size, options[:maxRows]) nrows.times do |i| y0 = options[:tableY] + i * (options[:cellHeight] + options[:cellYSpace]) y1 = y0 + options[:cellHeight] Canvas.rectangle(x0, y0, x1, y1, :fill => options[:cellColor]) # Canvas.text(tbl[i].inspect, textx, y0) end @@drawing = TableView.new(t, nrows) Canvas.sync return true end @@tableOptions = { :tableX => 20, :tableY => 20, :cellHeight => 15, :cellYSpace => 2, :cellWidth => 30, :cellColor => :lightgray, :textX => 70, :maxRows => 25, } end # HashLab end # RubyLabs