Function | Return Type | Description |
---|---|---|
kt_date(
|
string
| kt_date has been deprecated. For example of using the UDF from_unixtime() , please refer to this page.
|
kt_session(
| tuple(
| To calculate the number of sessions and the duration of each session.
Sessions with only a start time and no end time (only 1 API is sent), will count as 1 session with a session length of 0. |
kt_ipcountry(string ip) | string | Returns a two letter ISO country code for the given ip address |
kt_collectall(column foo) | array | Transposes all of the values of the specified column into an array. The order of the rows are maintained. |
$(months_ago(INT)) | N/A | Dynamically select the month partition in your WHERE clause. For more information, see example below. |
The kt_session
call will calculate the length of each session for each user. This example query is composed of three parts. The query in its entirety is shown as follows:
SELECT kt_session(calc_session.s, calc_session.evt_lst,2) AS (s, session_number, session_length)
FROM (
SELECT session_set.s, collect_set(session_set.utc_timestamp) evt_lst
FROM (
SELECT total_list.s, total_list.utc_timestamp
FROM (
SELECT pgr.s, pgr.utc_timestamp
FROM XYZ_pgr pgr
WHERE pgr.month = 201205 AND pgr.s != ""
UNION ALL
SELECT evt.s, evt.utc_timestamp
FROM XYZ_evt evt
WHERE evt.month = 201205 and evt.s != ""
) total_list
)session_set
GROUP BY session_set.s
) calc_session
ORDER BY s, session_number DESC
The above query can be broken down into three distinct queries; two inner queries and one outter query:
SELECT total_list.s, total_list.utc_timestamp
FROM (
SELECT pgr.s, pgr.utc_timestamp
FROM XYZ_pgr pgr
WHERE pgr.month = 201205 AND pgr.s != '' ''
UNION ALL
SELECT evt.s, evt.utc_timestamp
FROM XYZ_evt evt
WHERE evt.month = 201205 and evt.s != '' ''
) total_list
As we know that custom events, evt, and page requests, pgr, are the two messages types that contribute to a user's session, we must combine both of these tables into 1 master table. The information we need is the user ID and timestamp:
# | s | utc_timestamp |
---|---|---|
1 | 123456 | 1307321107 |
2 | 123456 | 1307321105 |
3 | 987654 | 1307551408 |
4 | 987654 | 1307551431 |
SELECT session_set.s, collect_set(session_set.utc_timestamp) evt_lst
FROM (
SELECT total_list.s, total_list.utc_timestamp
FROM (
SELECT pgr.s, pgr.utc_timestamp
FROM XYZ_pgr pgr
WHERE pgr.month = 201205
AND pgr.s != '' ''
UNION ALL
SELECT evt.s, evt.utc_timestamp
FROM XYZ_evt evt
WHERE evt.month = 201205 and evt.s != '' ''
) total_list
ORDER BY utc_timestamp DESC
) session_set
GROUP BY session_set.s
Using COLLECT_SET()
and GROUP BY
, we will collect the unique timestamps in an array for each user.
# | s | utc_timestamp |
---|---|---|
1 | 123456 | [1307321117,1307321099,1307321097,1306928320,1307321110,1307321111, 1306928314,1306928315,1307321107,1307321105] |
2 | 987654 | [1307551408,1307551431,1307916523,1308342427,1307551391,1307916518, 1307916693] |
Now, using the Upsight Analytics UDF kt_session
, we can find out the session length for each session of each user, assuming that for a single session the interval between messages received is less than 2 minutes.
# | s | session_number | session_length |
---|---|---|---|
1 | 123456 | 3 | 234 |
2 | 123456 | 2 | 30 |
3 | 123456 | 1 | 845 |
4 | 154532 | 1 | 56 |
Sample Result:
s | total_sessions | total_timespent_minute |
---|---|---|
10055268937534300000 | 1 | 5 |
1006536536664320000 | 2 | 10.4 |
10074295863763100000 | 18 | 61.9 |
10090273024484100000 | 2 | 6 |
In order to get the total number of sessions per user, or the total time spent per user, we must wrap around the original kt_sessions
example with an aggregate MAX()
or SUM()
. Here is an example query:
-- Total sessions/ Total time spent per user
-- Replace all text enclosed with ## ## (example ##month_in_YYYYMM_format## could be 201303) using the following instructions:
-- Replace ##APP1_pgr##, ##APP1_evt## with the name of your applications you are interested in reviewing, for example: demogame_pgr, demogame_evt
-- Replace ##month_in_YYYYMM_format## with the month you are interested in querying, for example 201303
-- Replace ##timeout_in_minutes_integer## with the desired timeout, for example a value of 2 would mean that when a user is inactive for 2 minutes or more, the session would end
-- For mobile applications we recommend setting the timeout to 2 minutes and for web applications we recommend using 30 minutes.
SELECT session_data.s, MAX(session_data.session_number) AS total_sessions, ROUND(SUM(session_data.session_length)/60,1) AS total_timespent_minute
FROM
(
SELECT kt_session(calc_session.s, calc_session.evt_lst,##timeout_in_minutes_integer##) AS (s, session_number, session_length)
FROM (
SELECT session_set.s, collect_set(session_set.utc_timestamp) evt_lst
FROM (
SELECT total_list.s, total_list.utc_timestamp
FROM (
SELECT pgr.s, pgr.utc_timestamp
FROM ##APP1_pgr## pgr
WHERE pgr.month = ##month_in_YYYYMM_format##
AND pgr.s != ""
UNION ALL
SELECT evt.s, evt.utc_timestamp
FROM ##APP1_evt## evt
WHERE evt.month = ##month_in_YYYYMM_format## and evt.s != ""
) total_list
)session_set
GROUP BY session_set.s
) calc_session
ORDER BY s, session_number DESC
)session_data
GROUP BY session_data.s
Sample Result:
s | ymd | total_sessions | total_timespent_minute |
---|---|---|---|
1006536536664320000 | 2013-08-18 | 1 | 3.5 |
1006536536664320000 | 2013-08-22 | 1 | 6.9 |
10074295863763100000 | 2013-08-24 | 8 | 39.7 |
10074295863763100000 | 2013-08-25 | 10 | 22.2 |
10140941397783700000 | 2013-08-05 | 5 | 9.9 |
10140941397783700000 | 2013-08-20 | 1 | 2.5 |
10140941397783700000 | 2013-08-27 | 2 | 3 |
There are two things to keep in mind when writing this query.
kt_sessions()
returns session data per session, so we need to wrap around the final query with a SUM()
and a GROUP BY s
to get the total time spent per user and a MAX()
to get the total number of sessions per user.
The other thing that we have to think about is grouping it by user and day. Since kt_session()
UDF doesn't take day as a bucket, we will need to rework our inner queries a bit.
Right before we apply the kt_session UDF
, we would usually have the timestamps for each user organized in an array per user. At this stage, we should organize the timestamps by user and day (instead of just users), like so:
s | ymd | timestamps |
---|---|---|
1234 | 2013-08-01 | [3456464,686586,234545,6786786] |
1234 | 2013-08-02 | [5756756,658657876,3453456,86586867] |
2345 | 2013-08-01 | [12325435] |
In the outer query, instead of doing the normal kt_session()
, you can then use the function CONCAT_WS
to combine the ymd
and s
into 1 bucket and later use the SPLIT()
function to separate the line, like so:
SELECT SPLIT(session_data.ymd__s, '__')[0] AS s,
SPLIT(session_data.ymd__s, '__')[1] AS ymd,
MAX(session_data.session_number) AS total_sessions,
ROUND(SUM(session_data.session_length/60),1) AS total_timespent_minute
FROM (
SELECT kt_session(CONCAT_WS('__', calc_session.s,calc_session.ymd),
calc_session.evt_lst, 2) AS (ymd__s, session_number,
session_length
)
Example Query:
-- Total sessions/ Total time spent per user per day
-- Replace all text enclosed with ## ## (example ##month_in_YYYYMM_format## could be 201303) using the following instructions:
-- Replace ##APP1_pgr##, ##APP1_evt## with the name of your applications you are interested in reviewing, for example: demogame_pgr, demogame_evt
-- Replace ##month_in_YYYYMM_format## with the month you are interested in querying, for example 201303
-- Replace ##timeout_in_minutes_integer## with the desired timeout, for example a value of 2 would mean that when a user is inactive for 2 minutes or more, the session would end
-- For mobile applications we recommend setting the timeout to 2 minutes and for web applications we recommend using 30 minutes.
SELECT SPLIT(session_data.ymd__s, '__')[0] AS s,
SPLIT(session_data.ymd__s, '__')[1] AS ymd,
MAX(session_data.session_number) AS total_sessions,
ROUND(SUM(session_data.session_length/60),1) AS total_timespent_minute
FROM (
SELECT kt_session(CONCAT_WS('__', calc_session.s, calc_session.ymd), calc_session.evt_lst, ##timeout_in_minutes_integer##) AS (ymd__s, session_number, session_length)
FROM (
SELECT session_set.s, FROM_UNIXTIME(CAST(session_set.ts AS BIGINT), 'yyyy-MM-dd') ymd, collect_set(session_set.ts) evt_lst
FROM (
SELECT total_list.s, total_list.ts
FROM (
SELECT pgr.s, pgr.ts
FROM ##APP1_pgr## pgr
WHERE pgr.month = ##month_in_YYYYMM_format##
AND pgr.s != ""
UNION ALL
SELECT evt.s, evt.ts
FROM ##APP1_evt## evt
WHERE evt.month = ##month_in_YYYYMM_format##
AND evt.s != ""
) total_list
) session_set
GROUP BY session_set.s, FROM_UNIXTIME(CAST(session_set.ts AS BIGINT), 'yyyy-MM-dd')
) calc_session
) session_data
GROUP BY SPLIT(session_data.ymd__s, '__')[0],SPLIT(session_data.ymd__s, '__')[1]
ORDER BY s
Dynamically select the month partition in your WHERE
clause. This will increase the speed and performance of your query. The macro is especially useful if you have scheduled queries where you are only interested in metrics from a specific time period (for example, the most recent month).
Select data from 2 months ago:
WHERE month = $months_ago(2))
Select data from this month, 2 months ago, and 3 months ago:
WHERE month in ( $(months_ago(0)), $(months_ago(2)), $(months_ago(3)) )