groonga - An open-source fulltext search engine and column store.

7.16.4. Suggestion

This section describes about the following completion features:

  • How it works
  • How to use
  • How to learn

7.16.4.1. How it works

The suggestion feature uses a search to compute suggested words:

  1. Cooccurrence search against learned data.

7.16.4.2. How to use

Groonga provides suggest command to use suggestion. --type suggest option requests suggestion

For example, here is an command to get suggestion results by "search":

Execution example:

suggest --table item_query --column kana --types suggest --frequency_threshold 1 --query search
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "suggest": [
#       [
#         2
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "search engine",
#         1
#       ],
#       [
#         "web search realtime",
#         1
#       ]
#     ]
#   }
# ]

7.16.4.3. How it learns

Cooccurrence search uses learned data. They are based on query logs, access logs and so on. To create learned data, groonga needs user input sequence with time stamp and user submit input with time stamp.

For example, an user wants to search by "engine". The user inputs the query with the following sequence:

  1. 2011-08-10T13:33:23+09:00: search engine (submit)
  2. 2011-08-10T13:33:28+09:00: web search realtime (submit)

Groonga can be learned from the submissions by the following command:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "1", "time": 1312950803.86057, "item": "search engine", "type": "submit"},
{"sequence": "1", "time": 1312950808.86057, "item": "web search realtime", "type": "submit"}
]

7.16.4.4. How to extract learning data

The learning data is stored into item_DATASET and pair_DATASET tables. By using select command for such tables, you can all extract learing data.

Here is the query to extract all learning data:

select item_DATASET --limit -1
select pair_DATASET --filter 'freq0 > 0 || freq1 > 0 || freq2 > 0' --limit -1

Without '--limit -1', you can't get all data. In pair table, the valid value of freq0, freq1 and freq2 column must be larger than 0.

Don't execute above query via HTTP request because enourmous number of records are fetched.