BigObject allows user-defined functions to be used within a query statement on table objects as well. Examples will be used to illustrate how it works.

IN GROUP BY


Assuming that a user would like to compare weekday and weekend sales, this question can be easily answered by the following SELECT statement:

 SELECT wkd(Date), SUM(total_price) FROM sales GROUP BY wkd(Date)

An example result table is as shown below:

wkd(Date) SUM(total_price)
0 3.90583e+06
1 1.5501e+06

As can be seen from the result, the weekday and weekend sales is about 3.9M and 1.55M respectively. The query is enabled by applying function wkd() to determine whether it is weekday (denoted by 0) or weekend (denoted by 1) for each transaction.

The function wkd() is quite simple:

function wkd(d)
  wd = d:weekday()
  if (wd >= 1 and wd <= 5) then
    return 0
  else 
    return 1
  end
end

The argument d is a DateTime type variable. We use built-in function weekday() to obtain its week day. It returns 0 if it is weekday (1 to 5) or 1 if it is weekend (0 or 6).

User-defined functions can be used with other attributes in the GROUP BY clause, such as:

SELECT Product.brand, wkd(Date), SUM(total_price) FROM sales GROUP BY Product.brand, wkd(Date)

The above SELECT statement returns the weekday and weekend sales for each product brand.

IN WHERE


User-defined functions can also be used in WHERE clause as a filter. For example, if the user only wants to obtain weekend sales, he can do:

SELECT SUM(total_price) FROM sales WHERE wkd(Date) = 1

IN SELECT


BigObject does not support expressions as selected values. However, it can be achieved by table programming. Assuming that the user would like to calculate the unit price for each transaction, he can define function unit_price() first:

function unit_price(qty, total_price)
   return qty / total_price;
end

Then the following SELECT can be issued together with function unit_price() to obtain the unit price for each transaction.

SELECT unit_price(qty, total_price) from sales 

Builtin Functions


eval()

Run a code script (for advanced user).

The variable of the return value is "_ret", and the other arguments will be arg[1], arg[2],... e.g. performing a user-defined equation based on qty and total_price columns

SELECT eval("_ret = (arg[1]+arg[2])/arg[2]" , qty , total_price) FROM sales LIMIT 10
SELECT * FROM sales WHERE eval("_ret = (arg[1]*0.8 +arg[2]*0.2)/arg[1]" , qty , total_price) > 3 LIMIT 10

strinter()

Compute the intersection between two strings.

The basic concept is to make two histograms (bins:0~255) from two strings which seen as byte arrays Then applying a measurement method to these histograms.

The strinter function includes three argument. The first argument specifies a string column for comparing. The second one is a comparing string. The third one is used to set a comparison method.

We implement three methods as follows:

count : return the intersection number between two histograms. (default)

norm1 : count / length of the comparing string

norm2 : count / (length of the first string * length of the second string)

e.g. list all customer names and the intersection of the string "Amy Mil"

SELECT name ,strinter(name , "Amy Mil") FROM Customer

e.g. list top 5 customer names based on the intersection of the string "Amy Mills sadas" (using "norm2" method)

SELECT name ,strinter(name , "Amy Mills sadas", "norm2") FROM Customer ORDER BY strinter(name , "Amy Mills sadas", "norm2") DESC LIMIT 5