Sha256: 16d22e9e3369f9e8c1fcb24d4b7768e34c76214b444d4fc85890986a6e3500ed

Contents?: true

Size: 1.99 KB

Versions: 1

Compression:

Stored size: 1.99 KB

Contents

This is far from finished, but there's enough done to compare the performance
for some basic searches (word-prefix, word and phrasal).

If you want to try it, here's what you have to do:

1) build the extension
   $ cd ext/ftsearch && ruby extconf.rb && make
   (no need to make install for now, ext/ftsearch is added to $: in the
    scripts you'll run)
   
   I've only tested this under i686-linux; some things are known not to work
   with 64bit platforms (but a few are detected at compile time, and the
   corresponding optimizations disabled).


2) index the corpora with Ferret and FTSearch.
   a) Unpack Linux's tree under  corpus/linux
   b) Run   
      $ ruby ferret-indexing-benchmark-linux-source.rb
      You will find a line like this in
      ferret-indexing-benchmark-linux-source.rb:
    field_infos.add_field(:body, :store => :yes, :term_vector => :with_positions_offsets)
                                           ====
      This controls whether the body is stored. Set it to :no to index faster
      (on my box, 2:45 instead of 3:30), but keep in mind that FTSearch's
      indexing is equivalent to :store => :yes.
   c) Run  
      $ ruby sample-indexer.rb linux

   Repeat (b), (c) if you want to compare them fairly when corpus/linux/* is
   cached.

3) Searching with Ferret & FTSearch

   $ ruby ferret-lookup.rb

   It will ask you for a query term and show the times/top results.
   Enter  !queryterm  to see how long it takes to get the first match.
   Enter an empty term (just press enter) when done.

   $ ruby sample-lookup.rb
   
   Same interface as ferret-lookup.rb.

Note: FTSearch uses a suffix-array, so if you look for e.g. "fa", it'll match
faq, fat, fat_entry, ..., making it equivalent to looking for "fa*" with
Ferret.

FTSearch does phrasal search naturally, if you're lookup for "big array", just
enter it (without the quotes); with ferret-lookup.rb, you *have* to
surround the phrase with quotes.


LICENSE
=======
Distribution and modification subject to the same terms as Ruby.

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
shoes-3.0.1 req/ftsearch/README