4.3. Various data types

Groonga is a full text search engine but also serves as a column-oriented data store. Groonga supports various data types, such as numeric types, string types, date and time type, longitude and latitude types, etc. This tutorial shows a list of data types and explains how to use them.

4.3.1. Overview

The basic data types of Groonga are roughly divided into 5 groups --- boolean type, numeric types, string types, date/time type and longitude/latitude types. The numeric types are further divided according to whether integer or floating point number, signed or unsigned and the number of bits allocated to each integer. The string types are further divided according to the maximum length. The longitude/latitude types are further divided according to the geographic coordinate system. For more details, see Data types.

In addition, Groonga supports reference types and vector types. Reference types are designed for accessing other tables. Vector types are designed for storing a variable number of values in one element.

First, let's create a table for this tutorial.

Execution example:

table_create --name ToyBox --flags TABLE_HASH_KEY --key_type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]

4.3.2. Boolean type

The boolean type is used to store true or false. To create a boolean type column, specify Bool to the type parameter of column_create command. The default value of the boolean type is false.

The following example creates a boolean type column and adds three records. Note that the third record has the default value because no value is specified.

Execution example:

column_create --table ToyBox --name is_animal --type Bool
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","is_animal":true}
{"_key":"Flower","is_animal":false}
{"_key":"Block"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select --table ToyBox --output_columns _key,is_animal
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "is_animal",
#           "Bool"
#         ]
#       ],
#       [
#         "Monkey",
#         true
#       ],
#       [
#         "Flower",
#         false
#       ],
#       [
#         "Block",
#         false
#       ]
#     ]
#   ]
# ]

4.3.3. Numeric types

The numeric types are divided into integer types and a floating point number type. The integer types are further divided into the signed integer types and unsigned integer types. In addition, you can choose the number of bits allocated to each integer. For more details, see Data types. The default value of the numeric types is 0.

The following example creates an Int8 column and a Float column, and then updates existing records. The load command updates the weight column as expected. On the other hand, the price column values are different from the specified values because 15.9 is not an integer and 200 is too large. 15.9 is converted to 15 by removing the fractional part. 200 causes an overflow and the result becomes -56. Note that the result of an overflow/underflow is undefined.

Execution example:

column_create --table ToyBox --name price --type Int8
# [[0, 1337566253.89858, 0.000355720520019531], true]
column_create --table ToyBox --name weight --type Float
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","price":15.9}
{"_key":"Flower","price":200,"weight":0.13}
{"_key":"Block","weight":25.7}
]
# [[0, 1337566253.89858, 0.000355720520019531], 3]
select --table ToyBox --output_columns _key,price,weight
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "price",
#           "Int8"
#         ],
#         [
#           "weight",
#           "Float"
#         ]
#       ],
#       [
#         "Monkey",
#         15,
#         0.0
#       ],
#       [
#         "Flower",
#         -56,
#         0.13
#       ],
#       [
#         "Block",
#         0,
#         25.7
#       ]
#     ]
#   ]
# ]

4.3.4. String types

The string types are divided according to the maximum length. For more details, see Data types. The default value is the zero-length string.

The following example creates a ShortText column and updates existing records. The third record ("Block" key record) has the default value (zero-length string) because it's not updated.

Execution example:

column_create --table ToyBox --name name --type ShortText
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","name":"Grease"}
{"_key":"Flower","name":"Rose"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,name
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "name",
#           "ShortText"
#         ]
#       ],
#       [
#         "Monkey",
#         "Grease"
#       ],
#       [
#         "Flower",
#         "Rose"
#       ],
#       [
#         "Block",
#         ""
#       ]
#     ]
#   ]
# ]

4.3.5. Date and time type

The date and time type of Groonga is Time. Actually, a Time column stores a date and time as the number of microseconds since the Epoch, 1970-01-01 00:00:00. A Time value can represent a date and time before the Epoch because the actual data type is a signed integer. Note that load and select commands use a decimal number to represent a data and time in seconds. The default value is 0.0, which means the Epoch.

Note

Groonga internally holds the value of Epoch as pair of integer. The first integer represents the value of seconds, on the other hand, the second integer represents the value of micro seconds. So, Groonga shows the value of Epoch as floating point. Integral part means the value of seconds, fraction part means the value of micro seconds.

The following example creates a Time column and updates existing records. The first record ("Monkey" key record) has the default value (0.0) because it's not updated.

Execution example:

column_create --table ToyBox --name time --type Time
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Flower","time":1234567890.1234569999}
{"_key":"Block","time":-1234567890}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,time
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "time",
#           "Time"
#         ]
#       ],
#       [
#         "Monkey",
#         0.0
#       ],
#       [
#         "Flower",
#         1234567890.12346
#       ],
#       [
#         "Block",
#         -1234567890.0
#       ]
#     ]
#   ]
# ]

4.3.6. Longitude and latitude types

The longitude and latitude types are divided according to the geographic coordinate system. For more details, see Data types. To represent a longitude and latitude, Groonga uses a string formatted as follows:

  • "longitude x latitude" in milliseconds (e.g.: "128452975x503157902")
  • "longitude x latitude" in degrees (e.g.: "35.6813819x139.7660839")

A number with/without a decimal point represents a longitude or latitude in milliseconds/degrees respectively. Note that a combination of a number with a decimal point and a number without a decimal point (e.g. 35.1x139) must not be used. A comma (',') is also available as a delimiter. The default value is "0x0".

The following example creates a WGS84GeoPoint column and updates existing records. The second record ("Flower" key record) has the default value ("0x0") because it's not updated.

Execution example:

column_create --table ToyBox --name location --type WGS84GeoPoint
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table ToyBox
[
{"_key":"Monkey","location":"128452975x503157902"}
{"_key":"Block","location":"35.6813819x139.7660839"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 2]
select --table ToyBox --output_columns _key,location
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         3
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "location",
#           "WGS84GeoPoint"
#         ]
#       ],
#       [
#         "Monkey",
#         "128452975x503157902"
#       ],
#       [
#         "Flower",
#         "0x0"
#       ],
#       [
#         "Block",
#         "128452975x503157902"
#       ]
#     ]
#   ]
# ]

4.3.7. Reference types

Groonga supports a reference column, which stores references to records in its associated table. In practice, a reference column stores the IDs of the referred records in the associated table and enables access to those records.

You can specify a column in the associated table to the output_columns parameter of a select command. The format is Src.Dest where Src is the name of the reference column and Dest is the name of the target column. If only the reference column is specified, it is handled as Src._key. Note that if a reference does not point to a valid record, a select command outputs the default value of the target column.

The following example adds a reference column to the Site table that was created in Create a table. The new column, named link, is designed for storing links among records in the Site table.

Execution example:

column_create --table Site --name link --type Site
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","link":"http://example.net/"}
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table Site --output_columns _key,title,link._key,link.title --query title:@this
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ],
#         [
#           "link._key",
#           "ShortText"
#         ],
#         [
#           "link.title",
#           "ShortText"
#         ]
#       ],
#       [
#         "http://example.org/",
#         "This is test record 1!",
#         "http://example.net/",
#         "test record 2."
#       ]
#     ]
#   ]
# ]

The type parameter of the column_create command specifies the table to be associated with the reference column. In this example, the reference column is associated with the own table. Then, the load command registers a link from "http://example.org" to "http://example.net". Note that a reference column requires the primary key, not the ID, of the record to be referred to. After that, the link is confirmed by the select command. In this case, the primary key and the title of the referred record are output because link._key and link.title are specified to the output_columns parameter.

4.3.8. Vector types

Groonga supports a vector column, in which each element can store a variable number of values. To create a vector column, specify the COLUMN_VECTOR flag to the flags parameter of a column_create command. A vector column is useful to represent a many-to-many relationship.

The previous example used a regular column, so each record could have at most one link. Obviously, the specification is insufficient because a site usually has more than one links. To solve this problem, the following example uses a vector column.

Execution example:

column_create --table Site --name links --flags COLUMN_VECTOR --type Site
# [[0, 1337566253.89858, 0.000355720520019531], true]
load --table Site
[
{"_key":"http://example.org/","links":["http://example.net/","http://example.org/","http://example.com/"]},
]
# [[0, 1337566253.89858, 0.000355720520019531], 1]
select --table Site --output_columns _key,title,links._key,links.title --query title:@this
# [
#   [
#     0,
#     1337566253.89858,
#     0.000355720520019531
#   ],
#   [
#     [
#       [
#         1
#       ],
#       [
#         [
#           "_key",
#           "ShortText"
#         ],
#         [
#           "title",
#           "ShortText"
#         ],
#         [
#           "links._key",
#           "ShortText"
#         ],
#         [
#           "links.title",
#           "ShortText"
#         ]
#       ],
#       [
#         "http://example.org/",
#         "This is test record 1!",
#         [
#           "http://example.net/",
#           "http://example.org/",
#           "http://example.com/"
#         ],
#         [
#           "test record 2.",
#           "This is test record 1!",
#           "test test record three."
#         ]
#       ]
#     ]
#   ]
# ]

The only difference at the first step is the flags parameter that specifies to create a vector column. The type parameter of the column_create command is the same as in the previous example. Then, the load command registers three links from "http://example.org/" to "http://example.net/", "http://example.org/" and "http://example.com/". After that, the links are confirmed by the select command. In this case, the primary keys and the titles are output as arrays because links._key and links.title are specified to the output_columns parameter.