File Binary Search

Get Version

0.0.1

→ ‘binarysearch’

What

With binarysearch you can do binary searches on a file. I created this library as a few weeks after participating in the Ruby Quiz #139 I found myself needing something similar in a project I was working on.

Installing

sudo gem install binarysearch

Or download the gem and install manually.

The basics

The approach binarysearch uses is to pre process the file to generate first an index file before being able to begin searching. It’s assumed the records are of variable size, it should not make much sense to use an index if you have records of fixed size. After the index is created you can begin to do searches.

Bear in mind I created this library to suit my needs, so, for example, when the index is being created, the contents of the original file is used to create another as I needed to transform the original file.

Demonstration of usage

Suppose we have a ~40MB file with a list of inflected words and their corresponding lemma in big_file.txt. Each line of the file has the format “inflected_word lemma”.

First we require the needed files:

require 'rubygems'
require 'binarysearch'

Creating the index

We have to create the index first, to do this, we need a subclass of BinarySearch::DataSource. In this example we are using a BinarySearch::FileDataSource which only reads each line from the file.

data_source = BinarySearch::FileDataSource.new('big_file.txt')
BinarySearch::index(data_source, 'searchable_file')

After this, searchable_file and searchable_file.idx have been created. This is one of the quirks the library has, if you don’t need to change the contents of big_file.txt you end up with searchable_file which is an exact duplicate of the original.

Searching

To search we need to subclass BinarySearch::Search and implement the methods parse, gt and eq. In this example parse reads a line, and split it in an array with it’s first element as the inflected word and the second element the lemma.

class MySearch < BinarySearch::Search
  def initialize
    super('searchable_file')
  end

  def parse(what)
    what.chomp.split
  end

  def gt(a, b)
    a[0] > b
  end

  def eq(a, b)
    a[0] == b
  end
end

And we search like this:

my_search = MySearch.new
my_search.search('item')

Forum

There are two forums:

Source

You can get the source using svn using this url svn://rubyforge.org/var/svn/binarysearch/trunk for anonymous access or go the the BinarySearch RubyForge page.

License

This code is free to use under the terms of the GPL license.

Contact

Comments are welcome. Send an email to Luis Parravicini or post a message to one of the available forums.

Luis Parravicini, 9th October 2007
Theme extended from Paul Battley