Besides analysis based on table objects, BigObject solution also offers tree objects as the basis for analytics via tree analysis method. A tree object has a hierarchical structure that offers a more efficient way of computation than using a table object in some data analytics situations. BigObject provides a transformation mechanism called trans-join (transformative join) to construct a tree object from a table object. In general, there are three main procedures and one optional step involved in tree analysis. They are building the tree object, applying the function on the tree object, getting the data from the tree and store the result in a table.

The general syntax of creating a tree object is following:

CREATE TREE tree_name (data fields declaration on each node) FROM table_name GROUP BY each level specified
Note
Columns with data type VARSTRING or CHAR cannot be used in GROUP BY attributes when creating a tree.

In the example of ABC Company case, if the staff wants to create a tree object named T1 with two levels, Customer.gender and Product.brand. He can build such tree with following command:

CREATE TREE T1 (SUM(qty), tmp_d DOUBLE) FROM sales GROUP BY Customer.gender, Product.brand

A tree T1 that looks like following in structure is created.

Each node in tree T1 holds two data fields, SUM(qty) and tmp_d. The value for SUM(qty) in each node is aggregated upward automatically through trans-join. The value of tmp_d is 0 for now, but it can be used to store the result of user-defined function.

Alternatively, the staff can also create a tree with same levels of declarations but contain more information and is more flexible for future use with CREATE ALL statement.

CREATE ALL TREE T2 (SUM(qty), tmp_d DOUBLE) FROM sales GROUP BY Customer.gender, Product.brand

With above command, a tree structure that looks like following is created.

Build ALL will retain information for each transaction. The bottom level, commonly referred as "#", corresponds to each transaction in the sales table. It (Level #) contains the information of each transaction with its order ID (ID on each node) and transaction quantity (qty). Since the original qty data is preserved at leaf level #, it is possible to perform further analysis on TREE T2.

Once the tree is built, the user can apply function on the tree object to do the analysis.
The general syntax for this is

APPLY function() TO tree_name AT specific level or tree path

A tree path can be a node name, star, index (-1 = last node) or a group of names.

For example:

'/Fanta' ----> Level 1 'Fanta' node

'/*/Iowa' ----->Level 1 all nodes with Level 2 'Iowa' node

'/Fanta/[-1] ----> Level 1 'Fanta node' and last node of Level 2

'/*/Powerade|Fanta' ----->Level 1 all nodes with Level 2 'Powerade' and 'Fanta' nodes

Besides the functions list in FIND by pattern that can be applied, there are two additional built-in functions available for tree analysis shown in table below:

Additional Functions for Tree Analysis

Function Syntax Example Description of the Function based on the Example
put APPLY put(tmp_d, 15) TO T1 AT '/Fanta' Set tmp_d value to 15 in the '/Fanta' node of tree T1

To query the data within the tree, a GET statement is required.
The general syntax for such GET statement is

GET tree path, data field FROM tree_name

Finally, to store the result in another table, the general syntax is following:

CREATE TABLE table_name AS (GET statement for tree)

Let's assume the staff of ABC company is interested to know which brand's 55% of quantity is coming from the top 20% of states (in quantity sold) using tree analysis.

The staff can first tran-join a tree from the sales table with the command:

CREATE TREE testtree1 (v1 DOUBLE, SUM(qty)) FROM sales GROUP BY Product.brand, Customer.state

Then apply pareto(55,20) function on the tree:

APPLY pareto(55,20) TO testtree1 AT Product.brand

Query the result:

GET '/*', v1, SUM(qty) FROM testtree1

A table that looks similar to following will be shown:

Those values that are greater than or equal to 0.55 under v1 column are the brands that have at least 55% of quantity sold from top 20 states.

Alternatively, the staff of ABC company can use following command to directly find those brands who qualify the criteria with following command:

APPLY pareto(55,20) TO testtree1 AT Product.brand {returnTable: true}

The return table will only show those brands with v1 greater than or equal to 0.55:

Finally, ABC company staff can save the result into a table with following command:

BUILD TABLE saved_tree_result AS (GET '/*', v1, SUM(qty) FROM testtree1)

Auxiliary Clause:

The user can specify the apply function with a return table in the auxiliary clause.

Example:

APPLY pareto(55,20) TO testtree1 AT Product.brand {returnTable: true}

The user can also place a filter condition in the GET statement.

Example:

GET '/*', v1, SUM(qty) FROM testtree1 {filter: v1 > 0.55}

The user can also arrange the order shown in the GET statement with sortby command.

Example:

GET '/*', v1, SUM(qty) FROM testtree1 {sortby: v1}