# Tutorial This page introduce how to use Rroonga via a simple application making. ## Install You can install Rroonga in your compter with RubyGems. % sudo gem install rroonga ## Create Database Let's create database for simple bookmark application. Please execute irb with loading Rroonga with this command: % irb --simple-prompt -r groonga >> Then, try to create database in a file. >> Groonga::Database.create(:path => "/tmp/bookmark.db") => # From now, the created database is used implicitly. You don't have to be aware of it after you created a database first. ## Define table Groonga supports 4 types of tables. Groonga::Hash : Hash table. It manages records via each primary key. It supports very quickly exact match search. Groonga::PatriciaTrie : Patricia Trie. It supports some search such as predictive search and common prefix search, but it provides a little slowly exact match search than Groonga::Hash. It provides cursor to take records in ascending or descending order. Groonga::DoubleArrayTrie : Double Array Trie. It requires large spaces rather than other tables, but it can update key without ID change. It provides exract match search, predictive search and common prefix search and cursor like Groonga::PatriciaTrie. Groonga::Array : Array. It doesn't have primary keys. It manages records by ID. Now, you use Groonga::Hash and create the table named `Items`. The type of its primary key is String. >> Groonga::Schema.create_table("Items", :type => :hash) => [...] You have `Items` table by this code. You can refer the defined table with Groonga.[] like below: >> items = Groonga["Items"] => # You can treat it like Hash. For example, let's type `items.size` to get the number of records in the table. >> items.size => 0 ## Add records Let's add records to `Items` table. >> items.add("http://en.wikipedia.org/wiki/Ruby") => # >> items.add("http://www.ruby-lang.org/") => # Please check the number of records. It increases from 0 to 2. >> items.size => 2 If you can get record by primary key, type like below: >> items["http://en.wikipedia.org/wiki/Ruby"] => # ## Full text search Let's add item's title to full text search. first, you add the `Text` type column "`title`" to `Items` table. >> Groonga::Schema.change_table("Items") do |table| ?> table.text("title") >> end => [...] Defined columns is named as `#{TABLE_NAME}.#{COLUMN_NAME}`. You can refer them with {Groonga.[]} as same as tables. >> title_column = Groonga["Items.title"] => # Secondly, let's add the table containing terms from splited from texts. Then you define the `Terms` for it. >> Groonga::Schema.create_table("Terms", ?> :type => :patricia_trie, ?> :normalizer => :NormalizerAuto, ?> :default_tokenizer => "TokenBigram") You specify `:default_tokenzier => "TokenBigram"` for "Tokenizer" in the above code. "Tokenizer" is the object to split terms from texts. The default value for it is none. Full text search requires a tokenizer, so you specify "Bigram", a type of N-gram. Full text search with N-gram uses splited N characters and their position in texts. "N" in N-gram specifies the number of each terms. Groonga supports Unigram (N=1), Bigram (N=2) and Trigram (N=3). You also specify `:normalizer => :NormalizerAuto` to search texts with ignoring the case. Now, you ready table for terms, so you define the index of `Items.tiltle` column. >> Groonga::Schema.change_table("Terms") do |table| ?> table.index("Items.title") >> end => [...] You may feel a few unreasonable code. The index of `Items` table's column is defined as the column in `Terms`. When a record is added to `Items`, groonga adds records associated each terms in it to `Terms` automatically. `Terms` is a few particular table, but you can add some columns to term table such as `Terms` and manage many attributes of each terms. It is very useful to process particular search. Now, you finished table definition. Let's put some values to `title` of each record you added before. >> items["http://en.wikipedia.org/wiki/Ruby"].title = "Ruby" => "Ruby" >> items["http://www.ruby-lang.org/"].title = "Ruby Programming Language" "Ruby Programming Language" Now, you can do full text search like above: >> ruby_items = items.select {|record| record.title =~ "Ruby"} => # Groonga returns the search result as Groonga::Hash. Keys in this hash table is records of hitted `Items`. >> ruby_items.collect {|record| record.key.key} => ["http://en.wikipedia.org/wiki/Ruby", "http://www.ruby-lang.org/"] In above example, you get records in `Items` with `record.key`, and keys of them with `record.key.key`. You can access a refered key in records briefly with `record["_key"]`. >> ruby_items.collect {|record| record["_key"]} => ["http://en.wikipedia.org/wiki/Ruby", "http://www.ruby-lang.org/"] ## Improve the simple bookmark application Let's try to improve this simple application a little. You can create bookmark application for multi users and they can comment to each bookmarks. First, you add tables for users and for comments like below: ![Sample schema](images/sample-schema.png) Let's add the table for users, `Users`. >> Groonga::Schema.create_table("Users", :type => :hash) do |table| ?> table.text("name") >> end => [...] Next, let's add the table for comments as `Comments`. >> Groonga::Schema.create_table("Comments") do |table| ?> table.reference("item") >> table.reference("author", "Users") >> table.text("content") >> table.time("issued") >> end => [...] Then you define the index of `content` column in `Comments` for full text search. >> Groonga::Schema.change_table("Terms") do |table| ?> table.index("Comments.content") >> end => [...] You finish table definition by above code. Secondly, you add some users to `Users`. >> users = Groonga["Users"] => # >> users.add("alice", :name => "Alice") => # >> users.add("bob", :name => "Bob") => # Now, let's write the process to bookmark by a user. You assume that the user, `moritan`, bookmark a page including infomation related Ruby. First, you check if the page has been added `Items` already. >> items.has_key?("http://www.ruby-doc.org/") => false The page hasn't been added, so you add it to `Items`. >> items.add("http://www.ruby-doc.org/", ?> :title => "Ruby-Doc.org: Documenting the Ruby Language") => # Next, you add the record to `Comments`. This record contains this page as its `item` column. >> require "time" => true >> comments = Groonga["Comments"] => # >> comments.add(:item => "http://www.ruby-doc.org/", ?> :author => "alice", ?> :content => "Ruby documents", ?> :issued => Time.parse("2010-11-20T18:01:22+09:00")) => # ## Define methods for this process For usefull, you define methods for above processes. >> @items = items => # >> @comments = comments => # >> def add_bookmark(url, title, author, content, issued) >> item = @items[url] || @items.add(url, :title => title) >> @comments.add(:item => item, ?> :author => author, ?> :content => content, ?> :issued => issued) >> end => nil You assign `items` and `comments` to each instance variable, so you can use them in `add_bookmark` method. `add_bookmark` executes processes like below: * Check if the record associated the page exists in `Items` table. * If not, add the record to it. * Add the record to `Comments` table. With this method, lets bookmark some pages. >> add_bookmark("https://rubygems.org/", ?> "RubyGems.org | your community gem host", "alice", "Ruby gems", ?> Time.parse("2010-10-07T14:18:28+09:00")) => # >> add_bookmark("http://ranguba.org/", ?> "Fulltext search by Ruby with groonga - Ranguba", "bob", ?> "Ruby groonga fulltextsearch", ?> Time.parse("2010-11-11T12:39:59+09:00")) => # >> add_bookmark("http://www.ruby-doc.org/", ?> "ruby-doc", "bob", "ruby documents", ?> Time.parse("2010-07-28T20:46:23+09:00")) => # ## Full text search part 2 Let's do full text search for added records. >> records = comments.select do |record| ?> record["content"] =~ "Ruby" >> end => # >> records.each do |record| ?> comment = record >> p [comment.id, ?> comment.issued, ?> comment.item.title, ?> comment.author.name, ?> comment.content] >> end [1, 2010-11-20 18:01:22 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Alice", "Ruby documents"] [2, 2010-10-07 14:18:28 +0900, "RubyGems.org | your community gem host", "Alice", "Ruby gems"] [3, 2010-11-11 12:39:59 +0900, "Fulltext search by Ruby with groonga - Ranguba", "Bob", "Ruby groonga fulltextsearch"] [4, 2010-07-28 20:46:23 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Bob", "ruby documents"] You can access the columns with the same name method as each them. These methods suport to access the complex data type. (In usually RDB, you should namage JOIN tables, `Items`, `Comments`, `Users`.) The search is finished when the first sentence in this codes. The results of this search is the object as records set. >> records #> You can arrange this records set before output. For example, sort these records in the descending order by date. >> records.sort([{:key => "issued", :order => "descending"}]).each do |record| ?> comment = record >> p [comment.id, ?> comment.issued, ?> comment.item.title, ?> comment.author.name, ?> comment.content] >> end [1, 2010-11-20 18:01:22 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Alice", "Ruby documents"] [2, 2010-11-11 12:39:59 +0900, "Fulltext search by Ruby with groonga - Ranguba", "Bob", "Ruby groonga fulltextsearch"] [3, 2010-10-07 14:18:28 +0900, "RubyGems.org | your community gem host", "Alice", "Ruby gems"] [4, 2010-07-28 20:46:23 +0900, "Ruby-Doc.org: Documenting the Ruby Language", "Bob", "ruby documents"] => [...] Let's group the result by each item for easy view. >> records.group("item").each do |record| ?> item = record.key >> p [record.n_sub_records, ?> item.key, ?> item.title] >> end [2, "http://www.ruby-doc.org/", "Ruby-Doc.org: Documenting the Ruby Language"] [1, "https://rubygems.org/", "RubyGems.org | your community gem host"] [1, "http://ranguba.org/", "Fulltext search by Ruby with groonga - Ranguba"] => nil `n_sub_records` is the number of records in each group. It is similar value as count() function of a query including "GROUP BY" in SQL. ## more complex search Now, you challenge the more useful search. You should calcurate goodness of fit of search explicitly. You can use `Items.title` and `Comments.content` as search targets now. `Items.title` is the a few reliable information taken from each original pages. On the other hands, `Comments.content` is the less reliable information because this depends on users of bookmark application. Then, you search records with this policy: * Search item matched `Items.title` or `Comments.content`. * Add 10 times heavier weight to socres of each record matched `Items.title` than ones of `Comments.comment`. * If multi `comment` of one item are matched keyword, specify the sum of scores of each `coments` as score of the item. On this policy, you try to type below: >> ruby_comments = @comments.select {|record| record.content =~ "Ruby"} => # >> ruby_items = @items.select do |record| ?> target = record.match_target do |match_record| ?> match_record.title * 10 >> end >> target =~ "Ruby" >> end #> You group the results of *ruby_comments* in each item and union *ruby_items* . >> ruby_items = ruby_comments.group("item").union!(ruby_items) #> >> ruby_items.sort([{:key => "_score", :order => "descending"}]).each do |record| >> p [record.score, record.title] >> end [22, "Ruby-Doc.org: Documenting the Ruby Language"] [11, "Fulltext search by Ruby with groonga - Ranguba"] [10, "Ruby Programming Language"] [10, "Ruby"] [1, "RubyGems.org | your community gem host"] Then, you get the result.