5.3. 補完


  • どのように動作するか

  • 使い方

  • 学習方法

5.3.1. どのように動作するか


  1. 登録されている語を前方一致RK検索。

  2. 学習したデータを共起検索。

  3. 登録されている語を前方一致検索。(実行しないこともある)

5.3.2. 使い方

Groongaは補完機能を使うために suggest コマンドを用意しています。 --type complete オプションを使うと補完機能を利用できます。



suggest --table item_query --column kana --types complete --frequency_threshold 1 --query en
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "complete": [
#       [
#         1
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "engine",
#         1
#       ]
#     ]
#   }
# ]

5.3.3. 学習方法



  1. 2011-08-10T13:33:23+09:00: e
  2. 2011-08-10T13:33:23+09:00: en
  3. 2011-08-10T13:33:24+09:00: eng
  4. 2011-08-10T13:33:24+09:00: engi
  5. 2011-08-10T13:33:24+09:00: engin
  6. 2011-08-10T13:33:25+09:00: engine (検索実行!)


load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
{"sequence": "1", "time": 1312950803.86057, "item": "e"},
{"sequence": "1", "time": 1312950803.96857, "item": "en"},
{"sequence": "1", "time": 1312950804.26057, "item": "eng"},
{"sequence": "1", "time": 1312950804.56057, "item": "engi"},
{"sequence": "1", "time": 1312950804.76057, "item": "engin"},
{"sequence": "1", "time": 1312950805.86057, "item": "engine", "type": "submit"}

5.3.4. How to update RK reading data

Groonga requires registered word and its reading for RK search, so load such data in the advance.

Here is the example to register "日本" which means Japanese in english.


load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
{"sequence": "1", "time": 1312950805.86058, "item": "日本", "type": "submit"}
# [[0, 1337566253.89858, 0.000355720520019531], 1]

Here is the example to update RK data to complete "日本".


load --table item_query
{"_key":"日本", "kana":["ニホン", "ニッポン"]}
# [[0, 1337566253.89858, 0.000355720520019531], 1]

Then you can complete registered word "日本" by RK input - "nihon".


suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nihon
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "complete": [
#       [
#         1
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "日本",
#         2
#       ]
#     ]
#   }
# ]

Without loading above RK data, you can't complete registered word "日本" by query - "nihon".

As the column type of item_query table is VECTOR_COLUMN, you can register multiple readings for registered word.

This is the reason that you can also complete the registered word "日本" by query - "nippon".


suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nippon
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "complete": [
#       [
#         1
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "日本",
#         2
#       ]
#     ]
#   }
# ]

This feature is very convenient because you can search registered word even though Japanese IM is disabled.

If there are multiple candidates as completed result, you can customize priority to set the value of "boost" column in item_query table.

Here is the example to customize priority for RK search.


load --table event_query --each 'suggest_preparer(_id, type, item, sequence, time, pair_query)'
{"sequence": "1", "time": 1312950805.86059, "item": "日本語", "type": "submit"}
{"sequence": "1", "time": 1312950805.86060, "item": "日本人", "type": "submit"}
# [[0, 1337566253.89858, 0.000355720520019531], 2]
load --table item_query
{"_key":"日本語", "kana":"ニホンゴ"}
{"_key":"日本人", "kana":"ニホンジン"}
# [[0, 1337566253.89858, 0.000355720520019531], 2]
suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nihon
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "complete": [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "日本",
#         2
#       ],
#       [
#         "日本人",
#         2
#       ],
#       [
#         "日本語",
#         2
#       ]
#     ]
#   }
# ]
load --table item_query
{"_key":"日本人", "boost": 100},
# [[0, 1337566253.89858, 0.000355720520019531], 1]
suggest --table item_query --column kana --types complete --frequency_threshold 1 --query nihon
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   {
#     "complete": [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "_score",
#           "Int32"
#         ]
#       ],
#       [
#         "日本人",
#         102
#       ],
#       [
#         "日本",
#         2
#       ],
#       [
#         "日本語",
#         2
#       ]
#     ]
#   }
# ]