Advanced Features

Custom Attributes


DataMine query results can be used to populate custom user attributes, either that are new or for updating those that already exist. This can be especially valuable in the case where the SDK instrumentation for the attribute was not completed, and captured analytic data suitable for segmentation is available. In order to keep the attribute values current, the query can be scheduled and the results will be automatically applied to the user records.

Push Campaigns


The results of a DataMine query can be used to populate the list of tokens for a push notification campaign. Campaigns that are intended to send on a recurring schedule can be updated with the latest collection of tokens.

Template Functions


%upsight_time%

Let’s say you have a query that you run daily. Rather than adjusting the HQL each time, take advantage of the %upsight_time% function that allows for dynamic dates to populate the query.

    %upsight_time(format[, offset])%

Description

This function will output a date in the given format. When running a query in DataMine, the base date will be either Today, or the specified Preview Day, in the Options pane. When the query is running as part of a scheduled job, the base date will be the date/time of the job execution. Recommended use is for scheduled queries where the datetime value will always be variable.

Format

A string value, for example:

  • To compare months: 'yyyyMM'
  • To compare days: 'yyyyMMdd'
  • To compare hours: 'yyyyMMddhh'

Offset

In order to express an offset of time from a base date time, the following expression pattern can be used:

-1d -> 1 day previous
-7d -> 7 days previous
-1h -> 1 hour previous

Examples

To get installs from the current day:

SELECT COUNT(1) as installs 
FROM install 
WHERE day = %upsight_time('yyyyMMdd')%

To get data from the 7 days before today:

SELECT day, COUNT(1) as installs 
FROM install 
WHERE day >= %upsight_time('yyyyMMdd', '-7d')% AND day <=  %upsight_time('yyyyMMdd', '-1d')%


Note When templating is enabled, % characters identify custom keywords requiring any existing % characters to be escaped in the query with another %. For example:

type LIKE 'pub.%%'

or, for modulo operations:

pub_data['number'] %% 2 == 0

Custom Table Creation


DataMine allows you to save the results of your query to a table that can then be queried on.

  1. Log into your dashboard and navigate to EXPLORATION > DATAMINE and select your query.
  2. Hit RUN if your query has not run or is no longer displaying results.
  3. Select DOWNLOAD and click on Save Results to Table.
  4. Give your custom table a name.
  5. You will see your new table in the table dropdown list. You may now query and JOIN on your custom table.

Upsight User Defined Functions (UDFs)


In Hive, almost everything that manipulates data is a function. Hive can be extended to add more custom defined functions to increase its flexibility. Upsight has defined a set of its own User Defined Functions (UDFs) which allow for easier manipulation of data.

UP_FIRST_ROW() and UP_LAST_ROW()

In many use cases, you want to drop unnecessary rows in your query. For example, when looking at unique users or a user’s most recent activity. The UP_FIRST_ROW() and UP_LAST_ROW() functions use a boolean true / false indicator to denote the first and last row of data and may be used in conjunction with OVER(PARTITION BY sid ) to reduce data down to one row per user.

The query below ranks the events a user has sent by timestamp, and marks their first message true.

SELECT sid, type,
UP_FIRST_ROW(ts)OVER( PARTITION BY sid) as first_row
FROM event

By adding a WHERE clause in the next select statement, you can successfully unique users.

Unique Users on Nov 1st, 2017

SELECT sid
FROM
(
    SELECT sid, 
      UP_FIRST_ROW(ts)OVER( PARTITION BY sid) as first_row
      FROM event
      WHERE day = 20171101
) t1
WHERE first_row = true

Most recent action of users on Nov 1st, 2017

In this case, we want to rank by timestamp of events but select the last row (most recent) event.

SELECT sid, type
FROM
(
    SELECT sid, type
      UP_LAST_ROW(ts)OVER( PARTITION BY sid) as last,row
      FROM event
      WHERE day = 20171101
) t1
WHERE last_row = true

Our example queries use ROW_NUMBER() OVER(PARTITION BY … ) to achieve the same effect. UP_FIRST_ROW() and UP_LAST_ROW() is designed to be more efficient because rather than reading through and numbering all rows, it only needs to find one row.

UP_DAYS_DIFF()

The UP_DAYS_DIFF() function returns the difference between two dates in days. It accepts many common time formats including the Upsight day field (YYYYMMdd) and ts (Unixtime timestamp).

UP_DAYS_DIFF(max_date,min_date)

Examples

UP_DAYS_DIFF('2017-11-15', '2017-11-01' )
  14
UP_DAYS_DIFF('2017-11-01', day ) --where day is Oct-01
  31

UP_DATE_FORMAT()

This function allows for flexible manipulation of dates. It accepts many common time formats including the Upsight day field (YYYYMMdd) and ts (Unixtime timestamp). The function takes the inputted time and converts it to the specified date format.

UP_DATE_FORMAT(input_date,”date format”)

Examples

UP_DATE_FORMAT( "2017-07-01 10:20:30", "YYYYMMdd")
  '20170701'

UP_DATE_FORMAT( "20120701102030", "YYYY-MM-dd HH:mm:ss") 
 '2017-07-01 10:20:30'

UP_DATE_FORMAT( "2017/07/01 10:20:30", "YYYYMMddHH")
  '2017070110'

UP_DATE_FORMAT( ts, "YYYY-MM-dd") --timestamp in Unixtime
  '2017-08-10'

UP_MAPSUM_BY_KEYS()

This function allows you to perform sums on values in your pub_data and user_attributes field.

UP_MAPSUM_BY_KEYS(mapped_field, array(“key1”, “key2” ))

Examples

SELECT UP_MAP_SUM_BY_KEYS(pub_data, array("amount"))
FROM event
WHERE type = 'pub.economy.earn.coin'
{ “amount”, 145000 }


SELECT UP_MAP_SUM_BY_KEYS(user_attributes, array("completed_tutorial","is_spender"))
FROM event
WHERE type = 'pub.arena.battle.win'
{ “completed_tutorial”: 503, “is_spender”: 212 }

Hive UDFs

Hive itself also implements UDFs which you can see here.

UDFs


Function Return Type Description
from_unixtime(bigint unixtime[, string format])string Changes the unix timestamp into various formats. For more information, refer to this page.
datediff(string enddate, string startdate)int Return the difference in days between startdate to enddate:
datediff('2009-03-01', '2009-02-27') = 2
datediff('2009-02-27', '2009-03-01') = -2

Examples

Function Call Description Return Value
FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') Changes the unix timestamp into year-month-day numerical format. 2013-09-17
FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd') Returns the current date in year-month-day numerical format. This is useful for scheduled queries. 2014-09-17
FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd') Casts the string ts to a BIGINT before passing it to FROM_UNIXTIME since it only accepts BIGINT types. Note Since the ts parameter is a string, you will need to do this every time you pass it to FROM_UNIXTIME. 2013-09-17
FROM_UTC_TIMESTAMP((utc_timestamp*1000), "CET") Changes the timezone from GMT to GMT+1. There are 2 things to note. First we have to transform the timestamp into milliseconds, second for other possible timezones, you can refer to this page 2013-09-17 20:48:35
FROM_UNIXTIME((utc_timestamp+3600), 'yyyy-MM-dd HH:mm') Changes the timezone from GMT to GMT+1. Since GMT+1 is one hour ahead of GMT, we simply add 3600 seconds to the timestamp. 2013-09-17 20:48
FROM_UNIXTIME(utc_timestamp) Changes the unix timestamp into year-month-day numerical format with the hours, minutes, and seconds from the time appended in 24 hour format. 2013-09-17 20:48:35
MONTH(FROM_UNIXTIME(utc_timestamp)) Retrieves the month number from a unix time stamp. 9
FROM_UNIXTIME(utc_timestamp,'HH:mm') Retrieves the hours and minutes of the time from a unix timestamp in 24 hour format. 20:48
FROM_UNIXTIME(utc_timestamp,'hh:mm') Retrieves the hours and minutes of the time from a unix timestamp in 12 hour format. 08:48
FROM_UNIXTIME(utc_timestamp , 'EEEE') Retrieves the day of the week from a unix timestamp. Monday