Table Object and Data Type
BigObject analytics is designed to analyze multi-dimensional data in star or snowflake schema. Multi-dimensional data is organized into tables. There are two types of tables in BigObjects - the dimensional table and the fact table.
Dimension Table
A dimension consists of a set of items with certain descriptive properties called attributes. For example, a business data is multi-dimensional. It could contain information such as product, channel and price. When such information is organized and stored in a table object, it is referred to as a dimension table in BigObject.
Let's place this in the ABC company example. As ABC company starts its business, the staff wants to first create a database for future analytics. The first thing he may want to create is the database of all of its members who happen to be customers of ABC company's clients. A dimension table called "Customer" may be created to look like this:
id | name | language | state | company | gender | age |
---|---|---|---|---|---|---|
1 | Ryan Mills | Korean | South Carolina | Tazz | Male | 55 |
2 | Christopher Thompson | Māori | District of Columbia | Thoughtblab | Male | 13 |
3 | Shawn Rose | Hindi | District of Columbia | Livetube | Male | 33 |
4 | Julia Kennedy | Italian | Florida | Fivespan | Female | 30 |
5 | Virginia Reynolds | Persian | North Carolina | Dabfeed | Female | 42 |
To refer an attribute a in a dimension table D, the user needs to specify as D.a. For example, Customer.Gender represents Gender attribute in Customer dimension table.
Table name and attribute name are case sensitive in BigObject.
Fact Table
Fact is the data needed to be analyzed. When fact data stored in a table object, it is referred to as a fact table in BigObject.
When ABC company staff creates the database, he will also store the transaction records into a table for future analysis. A fact table called "sales" is created by the staff that looks like the following:
order_id | Customer.id | Product.id | channel_name | Date | qty | total_price |
---|---|---|---|---|---|---|
1 | 3226 | 2557 | am/pm | 2013-01-01 00:04:05 | 8 | 52.24 |
2 | 6691 | 2631 | am/pm | 2013-01-01 00:11:27 | 4 | 39.72 |
2 | 6691 | 1833 | am/pm | 2013-01-01 00:21:03 | 1 | 6.9 |
3 | 4138 | 1626 | am/pm | 2013-01-01 00:30:22 | 5 | 42.1 |
3 | 4138 | 375 | am/pm | 2013-01-01 00:35:44 | 6 | 67.26 |
3 | 4138 | 3336 | am/pm | 2013-01-01 00:45:12 | 8 | 41.68 |
3 | 4138 | 736 | CVS | 2013-01-01 00:55:34 | 6 | 56.4 |
4 | 1292 | 4434 | 7-11 | 2013-01-01 01:06:00 | 6 | 86.64 |
Note that column Customer.id and Product.id refers to the column id in Customer
dimenstion table and id in Product
dimenssion table respectively.
Data Types
BigObject supports following data types:
Type | Description |
---|---|
STRING | Encoded string ended with NULL(0) character |
CHAR | Fixed-length string |
VARSTRING | Variable-length string; suitable for non-repeatable strings to save space |
BYTE | Single character(ASCII range 32-126) |
INT8 | 8-bit integer |
INT16 | 16-bit integer |
INT32 | 32-bit integer |
INT64 | 64-bit integer |
FLOAT | 4-byte floating point |
DOUBLE | 8-byte double precision floating point |
DATE32 | Year, month, and day of month (4 bytes long) |
DATETIME32 | Date and time (4 bytes long) |
DATETIME64 | Date and time (8 bytes long) |
TIMESTAMP | Timestamp with sub-second precision (8 bytes long) |
IPv4 | Internet Protocol version 4 |
IPv6 | Internet Protocol version 6 |
BINARY | Fixed length binary |
VARBINARY | Variable-length binary |
POINT | A geometry location with coordinate X and Y |
LINESTRING | A sequence of points representing connected line segments |
POLYGON | A sequence of points representing an exterior bounding ring and zero or more interior rings. |
MULTIPOLYGON | A collection of zero or more polygons. |
The default and maximum length of each string type is listed below:
Type | default length | maximum length |
---|---|---|
STRING | 63 | 1023 |
CHAR | 32 | 786432 |
VARSTRING | 255 | 786432 |
The maximum allowed number of points in LINESTRING, POLYGON, and MULTIPOLYGON are around 48000 points.
Below is the range for date related data types:
unit | DATE32 | DATETIME32 | DATETIME64 |
---|---|---|---|
year | -32768 - 32767 | 2000 - 2063 | -32768 - 32767 |
month | 1 - 12 | 1 - 12 | 1 - 12 |
day of month | 1 - 31 | 1 - 31 | 1 - 31 |
hours | N.A. | 0 - 23 | 0 - 23 |
minutes | N.A. | 0 - 59 | 0 - 59 |
seconds | N.A. | 0 - 59 | 0 - 59 |