Utility Functions
The following list shows a set of handy functions:
Name | Description |
---|---|
AES_DECRYPT() | AES decryption |
AES_ENCRYPT() | AES encryption |
CAST() | Convert data to other type. |
CUMSUM() | Calculate the cumulative sum of data in each group. |
CUT() | Convert values into different intervals. |
DIFF() | Calculate the difference of specified data between current row and previous row. |
GROUP_RANK() | Return rank for each (grouped) row. |
GROUP_ROW_NUMBER() | Return row number in each group. |
AES_ENCRYPT
aes_decrypt(data, key)
aes_encrypt(data, key)
The AES_ENCRYPT function encrypts data using AES algorithm. The return type of AES encryption functions is the same as the data type except for STRING, VARSTRING, and constant string, whose result is VARBINARY.
CAST
cast(expr AS new_type)
The CAST function converts expression to new_type of data. The following table shows the available conversion between types:
source type | new type |
---|---|
INT8 | INT64, DOUBLE |
INT16 | INT32, INT64, DOUBLE |
INT32 | INT64, DOUBLE |
INT64 | DOUBLE |
FLOAT | INT64, DOUBLE |
DOUBLE | INT64 |
STRING | INT8, INT16, INT32, INT64, FLOAT, DOUBLE |
CHAR | INT8, INT16, INT32, INT64, FLOAT, DOUBLE, CHAR, STRING |
VARSTRING | INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING |
DATETIME32 | DATE32, DATETIME64 |
DATETIME64 | DATE32, DATETIME32 |
TIMESTAMP | DATE32, DATETIME32, DATETIME64 |
HYPERLOGLOG | INT64 |
CUT
cut(col_name, cut_point, interval, span_num)
The CUT function converts values in col_name into different intervals.
- cut_point specifies the break point where the intervals are spanned into both sides from the break point.
- interval denotes the size of each interval.
- span_num specifies the number of intervals for each side from the cut point. The default value is 3.
For example, cut(data, 5, 2, 2) will convert value in data column into intervals (-,1], (1,3], (3,5], (5,7], (7,9], (9,-].
CUMSUM
cumsum(data_col, [group_col])
The CUMSUM function returns the cumulative sum of data for each group in group_col. Note that it is assumed that table is in order by the group column. If group is not given, the whole table is deemed as one group.
DIFF
diff(data_col, [group_col])
Returns the difference of data between current row and previous row. The diff value for the first row within the same group is set to 0. If group_col is not given, the whole table is deemed as one single group.
GEOHASH
geohash(longitude, latitude, max_len)
Returns an INT64 number representing the Geohash with maximum max_len characters. If no max_len is specified, the default value is 12.
GROUP_RANK
GROUP_RANK([group_col1, ..., group_coln])
Returns the starting rank for each row within the same group in the table. Note that each row in the same group shall have the same rank. If group columns are not given, the whole table is treated as one single group.
GROUP_ROW_NUMBER
GROUP_ROW_NUMBER([group_col1, ..., group_coln])
The GROUP_ROW_NUMBER returns row number within the samp group of group_col1, ... group_coln. If group columns are not given, the whole table is treated as one single group.
Examples
SELECT g, v, cumsum(v, g), diff(v, g), group_rank(g), group_row_number(g) FROM t
+---+----+-------------+-----------+---------------+---------------------+
| g | v | cumsum(v,g) | diff(v,g) | group_rank(g) | group_row_number(g) |
+---+----+-------------+-----------+---------------+---------------------+
| A | 1 | 1 | 0 | 1 | 1 |
| A | 2 | 3 | 1 | 1 | 2 |
| B | 10 | 10 | 0 | 3 | 1 |
| B | 6 | 16 | -4 | 3 | 2 |
| B | 8 | 24 | 2 | 3 | 3 |
| C | 2 | 2 | 0 | 6 | 1 |
| C | 3 | 5 | 1 | 6 | 2 |
+---+----+-------------+-----------+---------------+---------------------+