5.4. Correction

This section describes about the following correction features:

  • How it works
  • How to use
  • How to learn

5.4.1. How it works

The correction feature uses three searches to compute corrected words:

  1. Cooccurrence search against learned data.
  2. Similar search against registered words. (optional)

5.4.2. How to use

Groonga provides suggest command to use correction. --type correct option requests corrections.

For example, here is an command to get correction results by "saerch":

Execution example:

suggest --table item_query --column kana --types correction --frequency_threshold 1 --query saerch
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "correct": [
#       [
#         1
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "search",
#         1
#       ]
#     ]
#   }
# ]

5.4.3. How it learns

Cooccurrence search uses learned data. They are based on query logs, access logs and so on. To create learned data, groonga needs user submit inputs with time stamp.

For example, an user wants to search by "search" but the user has typo "saerch" before inputs the correct query. The user inputs the query with the following sequence:

  1. 2011-08-10T13:33:23+09:00: s
  2. 2011-08-10T13:33:23+09:00: sa
  3. 2011-08-10T13:33:24+09:00: sae
  4. 2011-08-10T13:33:24+09:00: saer
  5. 2011-08-10T13:33:24+09:00: saerc
  6. 2011-08-10T13:33:25+09:00: saerch (submit!)
  7. 2011-08-10T13:33:29+09:00: serch (correcting...)
  8. 2011-08-10T13:33:30+09:00: search (submit!)

Groonga can be learned from the input sequence by the following command:

load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
[
{"sequence": "1", "time": 1312950803.86057, "item": "s"},
{"sequence": "1", "time": 1312950803.96857, "item": "sa"},
{"sequence": "1", "time": 1312950804.26057, "item": "sae"},
{"sequence": "1", "time": 1312950804.56057, "item": "saer"},
{"sequence": "1", "time": 1312950804.76057, "item": "saerc"},
{"sequence": "1", "time": 1312950805.76057, "item": "saerch", "type": "submit"},
{"sequence": "1", "time": 1312950809.76057, "item": "serch"},
{"sequence": "1", "time": 1312950810.86057, "item": "search", "type": "submit"}
]