4.5. Drilldown¶
You learned how to search and sort searched results in the previous sections. Now that you can search as you likes, but how do you summarize the number of records which has specific value in the column?
As you know, there is a naive solution to execute query by every the value of column, then you can get the number of records as a result. It is a simple way, but it is not reasonable to many records.
If you are familiar with SQL, you will doubt with "Is there a similar SQL functionality to GROUP BY
in Groonga?".
Of course, Groonga provides such a functionality. It's called as drilldown
.
drilldown
enables you to get the number of records which belongs to specific the value of column at once.
To illustrate this feature, imagine the case that classification by domain and grouping by country that domain belongs to.
Here is the concrete examples how to use this feature.
In this example, we add two columns to Site
table. domain
column is used for TLD (top level domain). country
column is used for country name. The type of these columns are SiteDomain
table which uses domain name as a primary key and SiteCountry
table which uses country name as a primary key.
Execution example:
table_create --name SiteDomain --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
table_create --name SiteCountry --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name domain --flags COLUMN_SCALAR --type SiteDomain
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table Site --name country --flags COLUMN_SCALAR --type SiteCountry
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","domain":".org","country":"japan"},
{"_key":"http://example.net/","domain":".net","country":"brazil"},
{"_key":"http://example.com/","domain":".com","country":"japan"},
{"_key":"http://example.net/afr","domain":".net","country":"usa"},
{"_key":"http://example.org/aba","domain":".org","country":"korea"},
{"_key":"http://example.com/rab","domain":".com","country":"china"},
{"_key":"http://example.net/atv","domain":".net","country":"china"},
{"_key":"http://example.org/gat","domain":".org","country":"usa"},
{"_key":"http://example.com/vdw","domain":".com","country":"japan"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 9]
Here is a example of drilldown with domain
column. Three kind of values are used in domain
column - ".org", ".net" and ".com".
Execution example:
select --table Site --limit 0 --drilldown domain
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ]
# ]
# ]
Here is a summary of above query.
Group by | The number of group records | Group records means following records |
---|---|---|
.org | 3 | |
.net | 3 | |
.com | 3 |
The value of drilldown are returned as the value of _nsubrecs
column. In this case, Site
table is grouped by ".org", ".net", ".com" domain. _nsubrecs
shows that each three domain has three records.
If you execute drildown to the column which has table as a type, you can get the value of column which is stored in referenced table.
_nsubrecs
pseudo column is added to the table which is used for drilldown. this pseudo column stores the number of records which is grouped by.
Then, investigate referenced table in detail. As Site
table use SiteDomain
table as column type of domain
, you can use --drilldown_output_columns
to know detail of referenced column.
Execution example:
select --table Site --limit 0 --drilldown domain --drilldown_output_columns _id,_key,_nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# 1,
# ".org",
# 3
# ],
# [
# 2,
# ".net",
# 3
# ],
# [
# 3,
# ".com",
# 3
# ]
# ]
# ]
# ]
Now, you can see detail of each grouped domain, drilldown by country
column which has ".org" as column value.
Execution example:
select --table Site --limit 0 --filter "domain._id == 1" --drilldown country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 3
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 1
# ]
# ]
# ]
# ]
4.5.1. Drilldown with multiple column¶
Drilldown feature supports multiple column. Use comma separated multiple column names as drildown
parameter.
You can get the each result of drilldown at once.
Execution example:
select --table Site --limit 0 --drilldown domain,country
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 3
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# ".org",
# 3
# ],
# [
# ".net",
# 3
# ],
# [
# ".com",
# 3
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "japan",
# 3
# ],
# [
# "brazil",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "korea",
# 1
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]
4.5.2. Sorting drildown results¶
Use --drilldown_sortby
if you want to sort the result of drilldown. For example, specify _nsubrecs
as ascending order.
Execution example:
select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "brazil",
# 1
# ],
# [
# "korea",
# 1
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ],
# [
# "japan",
# 3
# ]
# ]
# ]
# ]
4.5.3. limits drildown results¶
The number of drilldown results is limited to 10 as a default. Use drilldown_limit
and drilldown_offset
parameter to customize orilldown results.
Execution example:
select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs --drilldown_limit 2 --drilldown_offset 2
# [
# [
# 0,
# 1337566253.89858,
# 0.000355720520019531
# ],
# [
# [
# [
# 9
# ],
# [
# [
# "_id",
# "UInt32"
# ],
# [
# "_key",
# "ShortText"
# ],
# [
# "country",
# "SiteCountry"
# ],
# [
# "domain",
# "SiteDomain"
# ],
# [
# "link",
# "Site"
# ],
# [
# "links",
# "Site"
# ],
# [
# "location",
# "WGS84GeoPoint"
# ],
# [
# "title",
# "ShortText"
# ]
# ]
# ],
# [
# [
# 5
# ],
# [
# [
# "_key",
# "ShortText"
# ],
# [
# "_nsubrecs",
# "Int32"
# ]
# ],
# [
# "usa",
# 2
# ],
# [
# "china",
# 2
# ]
# ]
# ]
# ]
Note that drilldown to the column which stores string is slower than the columns which stores the other types. If you drilldown to string type of column, create the table that type of primary key is string, then create the column which refers that table.