You learned how to filter search results and sort ones in the previous sections. Now that you can search as you likes, but how do you summarize the number of records which has specific value in the column?
As you know, there is a naive solution to execute query by every the value of column, then you can get the number of records as a result. It is a simple way, but it is not reasonable to many records.
If you are familiar with SQL, you will doubt with "Is there a similar SQL functionality to GROUP BY in Groonga?".
Of course, Groonga provides such a functionality. It's called as drilldown.
drilldown enables you to get the number of records which belongs to specific the value of column at once.
To illustrate this feature, imagine the case that classification by domain and grouping by country that domain belongs to.
Here is the concrete examples how to use this feature.
In this example, we add two columns to Site table. domain column is used for TLD (top level domain). country column is used for country name. The type of these columns are SiteDomain table which uses domain name as a primary key and SiteCountry table which uses country name as a primary key.
Execution example:
table_create --name SiteDomain --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name SiteCountry --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name domain --flags COLUMN_SCALAR --type SiteDomain
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name country --flags COLUMN_SCALAR --type SiteCountry
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","domain":".org","country":"japan"},
{"_key":"http://example.net/","domain":".net","country":"brazil"},
{"_key":"http://example.com/","domain":".com","country":"japan"},
{"_key":"http://example.net/afr","domain":".net","country":"usa"},
{"_key":"http://example.org/aba","domain":".org","country":"korea"},
{"_key":"http://example.com/rab","domain":".com","country":"china"},
{"_key":"http://example.net/atv","domain":".net","country":"china"},
{"_key":"http://example.org/gat","domain":".org","country":"usa"},
{"_key":"http://example.com/vdw","domain":".com","country":"japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 9]
Here is a example of drilldown with domain column. Three kind of values are used in domain column - ".org", ".net" and ".com".
Execution example:
select --table Site --limit 0 --drilldown domain
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ]
# ]
# ]
Here is a summary of above query.
Group by | The number of group records | Group records means following records |
---|---|---|
.org | 3 | |
.net | 3 | |
.com | 3 |
The value of drilldown are returned as the value of _nsubrecs column. In this case, Site table is grouped by ".org", ".net", ".com" domain. _nsubrecs shows that each three domain has three records.
If you execute drildown to the column which has table as a type, you can get the value of column which is stored in referenced table. _nsubrecs pseudo column is added to the table which is used for drilldown. this pseudo column stores the number of records which is grouped by.
Then, investigate referenced table in detail. As Site table use SiteDomain table as column type of domain, you can use --drilldown_output_columns to know detail of referenced column.
Execution example:
select --table Site --limit 0 --drilldown domain --drilldown_output_columns _id,_key,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 1,
# ".org",
# 3
# ],
# [
# 2,
# ".net",
# 3
# ],
# [
# 3,
# ".com",
# 3
# ]
# ]
# ]
# ]
Now, you can see detail of each grouped domain, drilldown by country column which has ".org" as column value.
Execution example:
select --table Site --limit 0 --filter "domain._id == 1" --drilldown country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 1
# ]
# ]
# ]
# ]
Drilldown feature supports multiple column. Use comma separated multiple column names as drildown parameter. You can get the each result of drilldown at once.
Execution example:
select --table Site --limit 0 --drilldown domain,country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 3
# ],
# [
# "brazil",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "korea",
# 1
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]
Use --drilldown_sortby if you want to sort the result of drilldown. For example, specify _nsubrecs as ascending order.
Execution example:
select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "brazil",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ],
# [
# "japan",
# 3
# ]
# ]
# ]
# ]
The number of drilldown results is limited to 10 as a default. Use drilldown_limits and drilldown_offset parameter to customize orilldown results.
Execution example:
select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs --drilldown_limit 2 --drilldown_offset 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]
文字列型のカラムに対するドリルダウンは、他の型でのドリルダウンに比べて低速です。文字列でのドリルダウンを行いたい場合には、このチュートリアルのように、文字列型を主キーとするテーブルを別途作成し、そのテーブルを型とするカラムを作成します。