File Binary Search
Get Version
0.0.1→ ‘binarysearch’
What
With binarysearch
you can do binary searches on a file. I created this library as a few weeks after participating in the Ruby Quiz #139 I found myself needing something similar in a project I was working on.
Installing
sudo gem install binarysearch
Or download the gem and install manually.
The basics
The approach binarysearch
uses is to pre process the file to generate first an index file before being able to begin searching. It’s assumed the records are of variable size, it should not make much sense to use an index if you have records of fixed size. After the index is created you can begin to do searches.
Bear in mind I created this library to suit my needs, so, for example, when the index is being created, the contents of the original file is used to create another as I needed to transform the original file.
Demonstration of usage
Suppose we have a ~40MB file with a list of inflected words and their corresponding lemma in big_file.txt. Each line of the file has the format “inflected_word lemma”.
First we require the needed files:
require 'rubygems' require 'binarysearch'
Creating the index
We have to create the index first, to do this, we need a subclass of BinarySearch::DataSource
. In this example we are using a BinarySearch::FileDataSource
which only reads each line from the file.
data_source = BinarySearch::FileDataSource.new('big_file.txt') BinarySearch::index(data_source, 'searchable_file')
After this, searchable_file and searchable_file.idx have been created. This is one of the quirks the library has, if you don’t need to change the contents of big_file.txt you end up with searchable_file which is an exact duplicate of the original.
Searching
To search we need to subclass BinarySearch::Search
and implement the methods parse
, gt
and eq
. In this example parse reads a line, and split it in an array with it’s first element as the inflected word and the second element the lemma.
class MySearch < BinarySearch::Search def initialize super('searchable_file') end def parse(what) what.chomp.split end def gt(a, b) a[0] > b end def eq(a, b) a[0] == b end end
And we search like this:
my_search = MySearch.new my_search.search('item')
Forum
There are two forums:Source
You can get the source using svn using this url svn://rubyforge.org/var/svn/binarysearch/trunk
for anonymous access or go the the BinarySearch RubyForge page.
License
This code is free to use under the terms of the GPL license.
Contact
Comments are welcome. Send an email to Luis Parravicini or post a message to one of the available forums.
Luis Parravicini, 9th October 2007
Theme extended from Paul Battley