UDFs and Macros

Overview


Function Return Type Description
kt_date(
  bigint unixtime,
  [string pattern])
string kt_date has been deprecated. For example of using the UDF from_unixtime(), please refer to this page.
kt_session(
  string,
  array,
  [int timeout],
  ['include-start-end'])
tuple(
  string s,
  int label,
  int length,
  'include-start-end')

To calculate the number of sessions and the duration of each session.

  • Input timeout is measured in minutes, which is by default 30.
  • Output length is measured in seconds.
  • The parameter 'include-start-end' is optional; it will give two additional columns displaying the start and end time of each session.

Sessions with only a start time and no end time (only 1 API is sent), will count as 1 session with a session length of 0.

kt_ipcountry(string ip)string Returns a two letter ISO country code for the given ip address
kt_collectall(column foo)array Transposes all of the values of the specified column into an array. The order of the rows are maintained.
$(months_ago(INT)) N/A

Dynamically select the month partition in your WHERE clause. For more information, see example below.

Use kt_session to Return the Number of Sessions for Each User


The kt_session call will calculate the length of each session for each user. This example query is composed of three parts. The query in its entirety is shown as follows:

SELECT kt_session(calc_session.s, calc_session.evt_lst,2) AS (s, session_number, session_length)
FROM (
   SELECT session_set.s, collect_set(session_set.utc_timestamp) evt_lst
   FROM (
      SELECT total_list.s, total_list.utc_timestamp 
      FROM (
         SELECT pgr.s, pgr.utc_timestamp
         FROM XYZ_pgr pgr
         WHERE pgr.month = 201205 AND pgr.s != ""

         UNION ALL

         SELECT evt.s, evt.utc_timestamp
         FROM XYZ_evt evt
         WHERE evt.month = 201205 and evt.s != ""
      ) total_list 
   )session_set
   GROUP BY session_set.s
) calc_session 
ORDER BY s, session_number DESC

The above query can be broken down into three distinct queries; two inner queries and one outter query:

Inner Query #1

SELECT total_list.s, total_list.utc_timestamp 
FROM (
   SELECT pgr.s, pgr.utc_timestamp
   FROM XYZ_pgr pgr
   WHERE pgr.month = 201205 AND pgr.s != '' ''

   UNION ALL

   SELECT evt.s, evt.utc_timestamp
   FROM XYZ_evt evt
   WHERE evt.month = 201205 and evt.s != '' ''
) total_list

As we know that custom events, evt, and page requests, pgr, are the two messages types that contribute to a user's session, we must combine both of these tables into 1 master table. The information we need is the user ID and timestamp:

#sutc_timestamp
11234561307321107
21234561307321105
39876541307551408
49876541307551431

Inner Query #2

SELECT session_set.s, collect_set(session_set.utc_timestamp) evt_lst
FROM (
   SELECT total_list.s, total_list.utc_timestamp 
   FROM (
      SELECT pgr.s, pgr.utc_timestamp
      FROM XYZ_pgr pgr
      WHERE pgr.month = 201205
      AND pgr.s != '' ''

      UNION ALL

      SELECT evt.s, evt.utc_timestamp
      FROM XYZ_evt evt
      WHERE evt.month = 201205 and evt.s != '' ''
   ) total_list
   ORDER BY utc_timestamp DESC
) session_set
GROUP BY session_set.s

Using COLLECT_SET() and GROUP BY, we will collect the unique timestamps in an array for each user.

#sutc_timestamp
1123456[1307321117,1307321099,1307321097,1306928320,1307321110,1307321111, 1306928314,1306928315,1307321107,1307321105]
2987654[1307551408,1307551431,1307916523,1308342427,1307551391,1307916518, 1307916693]

Outer Query

Now, using the Upsight Analytics UDF kt_session, we can find out the session length for each session of each user, assuming that for a single session the interval between messages received is less than 2 minutes.

#ssession_numbersession_length
11234563234
2123456230
31234561845
4154532156

Get "Total sessions / Total time spent per user"


Sample Result:

stotal_sessionstotal_timespent_minute
1005526893753430000015
1006536536664320000210.4
100742958637631000001861.9
1009027302448410000026

In order to get the total number of sessions per user, or the total time spent per user, we must wrap around the original kt_sessions example with an aggregate MAX() or SUM(). Here is an example query:

-- Total sessions/ Total time spent per user 
-- Replace all text enclosed with ## ## (example ##month_in_YYYYMM_format## could be 201303) using the following instructions:
-- Replace ##APP1_pgr##, ##APP1_evt## with the name of your applications you are interested in reviewing, for example: demogame_pgr, demogame_evt
-- Replace ##month_in_YYYYMM_format## with the month you are interested in querying, for example 201303
-- Replace ##timeout_in_minutes_integer## with the desired timeout, for example a value of 2 would mean that when a user is inactive for 2 minutes or more, the session would end
-- For mobile applications we recommend setting the timeout to 2 minutes and for web applications we recommend using 30 minutes.

SELECT session_data.s, MAX(session_data.session_number) AS total_sessions, ROUND(SUM(session_data.session_length)/60,1) AS total_timespent_minute
FROM
(
   SELECT kt_session(calc_session.s, calc_session.evt_lst,##timeout_in_minutes_integer##) AS (s, session_number, session_length)
   FROM (
      SELECT session_set.s, collect_set(session_set.utc_timestamp) evt_lst
      FROM (
         SELECT total_list.s, total_list.utc_timestamp 
         FROM (
            SELECT pgr.s, pgr.utc_timestamp
            FROM ##APP1_pgr## pgr
            WHERE pgr.month = ##month_in_YYYYMM_format##
            AND pgr.s != ""

            UNION ALL

            SELECT evt.s, evt.utc_timestamp
            FROM ##APP1_evt## evt
            WHERE evt.month = ##month_in_YYYYMM_format## and evt.s != ""
         ) total_list 
      )session_set
      GROUP BY session_set.s
   ) calc_session 
   ORDER BY s, session_number DESC
)session_data
GROUP BY session_data.s

Get "Total sessions / Total time spent per user per day"


Sample Result:

symdtotal_sessionstotal_timespent_minute
10065365366643200002013-08-1813.5
10065365366643200002013-08-2216.9
100742958637631000002013-08-24839.7
100742958637631000002013-08-251022.2
101409413977837000002013-08-0559.9
101409413977837000002013-08-2012.5
101409413977837000002013-08-2723

There are two things to keep in mind when writing this query.

  1. kt_sessions() returns session data per session, so we need to wrap around the final query with a SUM() and a GROUP BY s to get the total time spent per user and a MAX() to get the total number of sessions per user.

  2. The other thing that we have to think about is grouping it by user and day. Since kt_session() UDF doesn't take day as a bucket, we will need to rework our inner queries a bit.

Right before we apply the kt_session UDF, we would usually have the timestamps for each user organized in an array per user. At this stage, we should organize the timestamps by user and day (instead of just users), like so:

symdtimestamps
12342013-08-01[3456464,686586,234545,6786786]
12342013-08-02[5756756,658657876,3453456,86586867]
23452013-08-01[12325435]

In the outer query, instead of doing the normal kt_session(), you can then use the function CONCAT_WS to combine the ymd and s into 1 bucket and later use the SPLIT() function to separate the line, like so:

SELECT SPLIT(session_data.ymd__s, '__')[0] AS s,
   SPLIT(session_data.ymd__s, '__')[1] AS ymd,
   MAX(session_data.session_number) AS total_sessions,
   ROUND(SUM(session_data.session_length/60),1) AS total_timespent_minute
FROM (
   SELECT kt_session(CONCAT_WS('__', calc_session.s,calc_session.ymd),
      calc_session.evt_lst, 2) AS (ymd__s, session_number,
      session_length
)

Example Query:

-- Total sessions/ Total time spent per user per day
-- Replace all text enclosed with ## ## (example ##month_in_YYYYMM_format## could be 201303) using the following instructions:
-- Replace ##APP1_pgr##, ##APP1_evt## with the name of your applications you are interested in reviewing, for example: demogame_pgr, demogame_evt
-- Replace ##month_in_YYYYMM_format## with the month you are interested in querying, for example 201303
-- Replace ##timeout_in_minutes_integer## with the desired timeout, for example a value of 2 would mean that when a user is inactive for 2 minutes or more, the session would end
-- For mobile applications we recommend setting the timeout to 2 minutes and for web applications we recommend using 30 minutes.

SELECT SPLIT(session_data.ymd__s, '__')[0] AS s,
   SPLIT(session_data.ymd__s, '__')[1] AS ymd,
   MAX(session_data.session_number) AS total_sessions,
   ROUND(SUM(session_data.session_length/60),1) AS total_timespent_minute
FROM (
   SELECT kt_session(CONCAT_WS('__', calc_session.s, calc_session.ymd), calc_session.evt_lst, ##timeout_in_minutes_integer##) AS (ymd__s, session_number, session_length)
   FROM (
      SELECT session_set.s, FROM_UNIXTIME(CAST(session_set.ts AS BIGINT), 'yyyy-MM-dd') ymd, collect_set(session_set.ts) evt_lst
      FROM (
         SELECT total_list.s, total_list.ts
         FROM (
            SELECT pgr.s, pgr.ts
            FROM ##APP1_pgr## pgr
            WHERE pgr.month = ##month_in_YYYYMM_format##
            AND pgr.s != ""

            UNION ALL

            SELECT evt.s, evt.ts
            FROM ##APP1_evt## evt
            WHERE evt.month = ##month_in_YYYYMM_format##
            AND evt.s != ""
         ) total_list
      ) session_set
      GROUP BY session_set.s, FROM_UNIXTIME(CAST(session_set.ts AS BIGINT), 'yyyy-MM-dd')
   ) calc_session
) session_data
GROUP BY SPLIT(session_data.ymd__s, '__')[0],SPLIT(session_data.ymd__s, '__')[1]
ORDER BY s

Use the "Dynamic Month" macro to select a specific time period


Dynamically select the month partition in your WHERE clause. This will increase the speed and performance of your query. The macro is especially useful if you have scheduled queries where you are only interested in metrics from a specific time period (for example, the most recent month).

Select data from 2 months ago:

WHERE month = $months_ago(2))

Select data from this month, 2 months ago, and 3 months ago:

WHERE month in ( $(months_ago(0)), $(months_ago(2)), $(months_ago(3)) )