NAME ---- mongoid-haystack.rb DESCRIPTION ----------- mongoid-haystack provides a zero-config, POLS, pure mongo, fulltext search solution for your mongoid models. INSTALL ------- rubygems: gem intstall 'mongoid-haystack' Gemfile: gem 'mongoid-haystack' rake db:mongoid:create_indexes # IMPORTANT ````ruby # you might want this in lib/tasks/db.rake ... # namespace :db do namespace :mongoid do task :create_indexes do Mongoid::Haystack.create_indexes end end end ```` SYNOPSIS -------- ````ruby # simple usage is simple # class Article include Mongoid::Document include Mongoid::Haystack field(:content, :type => String) end Article.create!(:content => 'teh cats') results = Article.search('cat') article = results.first.model # by default 'search' returns a Mongoid::Criteria object. the result set will # be full of objects that refer to a model in your app via a polymorphic # relation out. aka # # Article.search('foobar').first.class #=> Mongoid::Haystack::Index # Article.search('foobar').first.model.class #=> Article # # in an index view you are not going to want to expand the search index # objects into full blown models one at the time (N+1) so you can use the # 'models' method on the collection to effciently expand the collection into # your application models with the fewest possible queries. note that # 'models' is a terminal operator. that is to say it returns an array and, # afterwards, no more fancy query language is gonna work. # @results = Mongoid::Haystack.search('needle').models # pagination is supported *out of the box*. note that you should chain it # *b4* any call to 'models' as 'models' is a terminal operator: it returns # an array and *not* a Mongoid::Criteria object # @models = Mongoid::Haystack.search('needle'). paginate(:page => 3, :size => 42). models # haystack stems the search terms and does score based sorting all using a # fast b-tree # a = Article.create!(:content => 'cats are awesome') b = Article.create!(:content => 'dogs eat cats') c = Article.create!(:content => 'dogs dogs dogs') results = Article.search('dogs cats').models results == [b, a, c] #=> true results = Article.search('awesome').models results == [a] #=> true # cross model searching (site search)is supported out of the box, and models # can customise how they are indexed: # # - a global score lets some models appear hight in the global results # # - keywords count more than fulltext # class Article include Mongoid::Document include Mongoid::Haystack field(:title, :type => String) field(:content, :type => String) def to_haystack { :score => 11, :keywords => title, :fulltext => content } end end class Comment include Mongoid::Document include Mongoid::Haystack field(:content, :type => String) def to_haystack { :score => -11, :fulltext => content } end end a1 = Article.create!(:title => 'hot pants', :content => 'teh b 52s rock') a2 = Article.create!(:title => 'boring title', :content => 'but hot content that rocks') c = Comment.create!(:content => 'those guys rock') results = Mongoid::Haystack.search('rock') results.count #=> 3 models = results.models models == [a1, a2, c] #=> true. articles first beause we generally score them higher results = Mongoid::Haystack.search('hot') models = results.models models == [a1, a2] #=> true. because keywords score highter than general fulltext # you can decorate your search items with arbirtrary meta data and filter # searches by it later. this too uses a speedy b-tree index. # class Article include Mongoid::Document include Mongoid::Haystack belongs_to :author, :class_name => '::User' field(:title, :type => String) field(:content, :type => String) def to_haystack { :score => author.popularity, :keywords => title, :fulltext => content, :facets => {:author_id => author.id} } end end a = author.articles.create!( :title => 'iggy and keith', :content => 'seen the needles and the damage done...' ) articles_for_teh_author = Article.search('needle', :facets => {:author_id => author.id}) ```` DESCRIPTION ----------- there two main pathways to understand in the code. 1) shit going into the into the index. 2) shit coming out of the index. shit going in entails: - stem and stopword the search terms - create or update a new token for each - create an index item referening all the tokens with precomputed scores for example the terms 'dog dogs cat' might result in these tokens ````javascript [ { '_id' : '0x1', 'value' : 'dog', 'count' : 2 }, { '_id' : '0x2', 'value' : 'cat', 'count' : 1 } ] ```` being created|updated and this index item ````javascript { '_id' : '50c11759a04745961e000001' 'model_type' : 'Article', 'model_id' : '50c11775a04745461f000001' 'tokens' : ['0x1', '0x2'], 'score' : 10, 'keyword_scores' : { '0x1' : 2, '0x2' : 1 }, 'fulltext_scores' : { } } ```` being built some other information is tracked, but the two normal mongoid models - Mongoid::Haystack::Token - Mongoid::Haystack::Index are simple to look at and compromise 80% of the library functionality. a few things to notice: - tokens are counted and auto-id'd using hex notation and a sequence generator. the reason for this is so that their ids are legit hash keys in the keyword and fulltext score hashes (they are also smaller than 12 byte object_ids or the words themselves). aka this sort can be contructed: ````ruby order_by('keyword_scores.0x1' => :desc, 'keyword_scores.0x.1' => :desc) ```` - the data structure above allows both filtering for index items that have certain tokens, but also ordering them based on global, keyword, and fulltext score without resorting to map-reduce: a b-tree index can be used. - all tokens have their text/stem stored exactly once. aka: we do not store 'hugewords' all over the place but store it once and count occurances of it to keep the total index much smaller pulling objects back out in a search involved these logical steps: - filter the search terms through the same tokenizer as when indexed - lookup tokens for each of the tokens in the search string - using the count for each token, plus the global token count that has been tracked we can decide to order the results by relatively rare words first and, all else being equal (same rarity bin: 0.10, 0.20, 0.30, etc.), the order in which the user typed the words - this approach is applies and is valid whether we are doing a union (or) or intersection (all) search and regardless of whether facets are included in the search. facets, however, never affect the order unless done so by the user manually. eg ````ruby results = Mongoid::Haystack. search('foo bar', :facets => {:hotness.gte => 11}). order_by('facets.hotness' => :desc) ```` SEE ALSO -------- tests: ./test/mongoid-haystack_test.rb