4.7. match_columns parameter¶
4.7.1. Full-text search against multiple columns¶
Groonga supports full-text search against multiple columns. Let's consider blog site. Usually, blog site has a table which contains title column and content column. How do you search the blog entry which contains specified keywords in title or content?
In such a case, there are two ways to create indexes. One way is creating column index against each column. The other way is creating one column index against multiple columns. Either way, Groonga supports similar full-text search syntax.
4.7.1.1. Creating column index against each column¶
Here is the example which create column index against each column.
First, create Blog1
table, add title
column which stores title string, message
column which stores content of blog entry.
Then create IndexBlog1
table for column indexes, add index_title
column for title
column, index_message
column for message
column.
Execution example:
table_create --name Blog1 --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog1 --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog1 --name message --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name IndexBlog1 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog1 --name index_title --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source title
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog1 --name index_message --flags COLUMN_INDEX|WITH_POSITION --type Blog1 --source message
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Blog1
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
match_columns
option of select
command accepts multiple columns as search target.
Specify query string to query
option. Then you can do full-text search title and content of blog entries.
Let's try to search blog entries.
Execution example:
select --table Blog1 --match_columns title||message --query groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ],
# [
# 2,
# "grn2",
# "rakutan eggs 4 - 4 Groonga moritars",
# "baseball result"
# ]
# ]
# ]
# ]
select --table Blog1 --match_columns title||message --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ]
# ]
# ]
# ]
select --table Blog1 --match_columns title --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
4.7.1.2. Creating one column index against multiple columns¶
Groonga also supports one column index against multiple columns.
The difference for previous example is only one column index exists. Thus, There is one common column index against title and message column.
Even though same column index is used, Groonga supports to search against title column only, message column only and title or message column.
Execution example:
table_create --name Blog2 --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog2 --name title --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Blog2 --name message --flags COLUMN_SCALAR --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name IndexBlog2 --flags TABLE_PAT_KEY --key_type ShortText --default_tokenizer TokenBigram --normalizer NormalizerAuto
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table IndexBlog2 --name index_blog --flags COLUMN_INDEX|WITH_POSITION|WITH_SECTION --type Blog2 --source title,message
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Blog2
[
{"_key":"grn1","title":"Groonga test","message":"Groonga message"},
{"_key":"grn2","title":"baseball result","message":"rakutan eggs 4 - 4 Groonga moritars"},
{"_key":"grn3","title":"Groonga message","message":"none"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
Let's search same query in previous section. You can get same search results.
Execution example:
select --table Blog2 --match_columns title||message --query groonga
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 2,
# "grn2",
# "rakutan eggs 4 - 4 Groonga moritars",
# "baseball result"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
select --table Blog2 --match_columns title||message --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 2
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 1,
# "grn1",
# "Groonga message",
# "Groonga test"
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
select --table Blog2 --match_columns title --query message
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 1
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "message",
# "ShortText"
# ],
# [
# "title",
# "ShortText"
# ]
# ],
# [
# 3,
# "grn3",
# "none",
# "Groonga message"
# ]
# ]
# ]
# ]
Note
There may be a question that "which is the better solution for indexing." It depends on the case.
- Indexes for each column - The update performance tends to be better than multiple colum index because there is enough buffer for updating. On the other hand, the efficiency of disk usage is not so good.
- Indexes for multiple column - It saves disk usage because it shares common buffer. On the other hand, the update performance is not so good.
4.7.2. Full text search with specific index name¶
TODO
4.7.4. Indexes with Weight¶
TODO