The user can query information in a BigObject cluster with CLUSTER SELECT statement. The syntax for the CLUSTER SELECT statement is the same as the SELECT statement except the extra CLUSTER keyword.

A SQL cluster query issued at a BigObject master will be distributed and processed at all applicable databases known to the master. It is assumed that a table spread among different databases have the same table name.

Note
COUNT(DISTINCT) is approximated in a CLUSTER SELECT query.

An Introductory Example

We will use a simple example to demonstrate how to setup a BigObject cluster and how to use CLUSTER SELECT.

Run the following script to creates a "log" table partitioned in two databases "log1" and "log2" and add both databases to its cluster by setting IP to local address 127.0.0.1.

CREATE DATABASE log1;
CREATE DATABASE log2;
CREATE TABLE log1.log (page CHAR(64), uid INT64, dur INT32);
CREATE TABLE log2.log (page CHAR(64), uid INT64, dur INT32);
INSERT INTO log1.log VALUES ("index", 100, 30),("page1", 100, 120), ("index", 200, 50);
INSERT INTO log2.log VALUES ("page2", 200, 90),("index", 300, 10), ("page2", 100, 50);
ADD CLIENT ip="127.0.0.1" db="log1";
ADD CLIENT ip="127.0.0.1" db="log2";

In other words, a BigObject cluster is formed such that the same BigObject instance serves not only as a master but also as its own clients.


At first, use the following SELECT to check the table "log" in database "log1":

SELECT * FROM log1.log;

And the result shall be:

+-------+-----+-----+
| page  | uid | dur |
+-------+-----+-----+
| index | 100 |  30 |
| page1 | 100 | 120 |
| index | 200 |  50 |
+-------+-----+-----+

To take a look at the whole "log" table, use CLUSTER SELECT * statement:

CLUSTER SELECT * FROM log;

The result will look like:

+-------+-----+-----+
| page  | uid | dur |
+-------+-----+-----+
| index | 100 |  30 |
| page1 | 100 | 120 |
| index | 200 |  50 |
| page2 | 200 |  90 |
| index | 300 |  10 |
| page2 | 100 |  50 |
+-------+-----+-----+

It can be seen that table "log" in the cluster is deemed as one single table although which is actually spread in two different databases.


Next, use the following SELECT statement to obtain the page view count for each page in database "log1":

SELECT page, COUNT(*) as pv from log1.log group by page

And the result shall be:

+-------+----+
| page  | pv |
+-------+----+
| index |  2 |
| page1 |  1 |
+-------+----+

Now you can use the following CLUSTER SELECT statement to obtain the page view count for the whole "log" data:

CLUSTER SELECT page, COUNT(*) as pv from log group by page

You should obtain the following result:

+-------+----+
| page  | pv |
+-------+----+
| index |  3 |
| page1 |  1 |
| page2 |  2 |
+-------+----+

Note the query results from database "log1" and "log2" are merged in the final query result for the CLUSTER SELECT statement.