Upsight Analytics Shared Queries

Standard Queries

Count of Users by Gender


Description

This short query will give you a break down of the application’s users based on gender. Below is the cpu (User Information) table which we will be working with.

#utc_time_stamptypesbglclsfjson_datamonth
11343455cpu123451988mUSFL345201108
21364856cpu234561990mUSCA65201110
31454645cpu543531989fCAON657201201

Query

SELECT g gender, COUNT (DISTINCT s) total
FROM demo_cpu  
WHERE month = 201106
GROUP BY g

Sample Return

#gendertotal
1f85353
2m94645
3u2341

Best Practices Observed In This Query

  1. Date restriction on the month column. This will speed up the query by making sure that it is not looking at the entire dataset.
  2. Good naming convention. The column gender and total clearly describes the content of the column.
  3. De-duping of the s parameter. Since a single user may send multiple cpu messages to Kontagent, it is not guaranteed that the same user will only appear in the cpu table only once. Using the DISTINCT command ensures that we are only counting unique users.

Standard Queries

Daily Revenue


Description

This query will return the total revenue generated by day for a given month.

Query

SELECT FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') AS day, SUM(v) AS revenue_per_day_cents
FROM ##APP1_mtu##
WHERE month = ##month_in_YYYYMM_format##
GROUP BY FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd')

Sample Return

dayrevenue_per_day_cents
2013-03-22400
2013-03-25300
2013-03-27100

What does this mean?

  • The SUM function returns the total of all the values in the v field of whatever table you replace ##APP1_mtu## with.
  • It only counts the records that have the value of you choose in the month field. For example, 201303 would return records from March 2013
  • The GROUP BY command assures that the values are only added together when they share the same date.
  • The date is determined using the UDF FROM_UNIXTIME.
  • This function converts a Unix Epoch Timestamp to a date in the format YYYY-MM-DD.

Standard Queries

30 Day Retention Return Rate


Description

This query answers the question: For a given period, what is retention % of installing users over the next 30 days?

Note The more months we include, the slower the query will be.

Sample Visualization

30 Day Retention Return Rate

Definitions

  • We are assuming a standard instrumentation where the application sends some sort of a heartbeat message when the user is using the application with a PGR message and the first APA message is the install date of the user.
  • In the final result, retention is displayed as a percentage as an integer.
  • We are including Day 0 in our result (which should always be 100%) because it sets the scale of the retention chart from 0% - 100%.

Instructions

Replace all text enclosed with ## ## (for example ##APP1_pgr## could be myapp_pgr) using the following instructions:

  • Replace ##APP1_apa##, ##APP1_pgr## with the name of your applications you are interested in reviewing, for example: demogame_apa, demogame_pgr
  • Keep either line 25 or line 26 (the 2 HAVING clause), do the same for line 48 and 49. They must be identical.
  • Keep line 25 if you have a specific time period (of users that installed) that you would like to investigate. Note you might also have to change line 32 where it is selecting the engagement messages to select the appropriate month WHERE month IN ($(months_ago(0)),$(months_ago(1))).
  • Keep line 26 if you would like to schedule this query / pin the query on a custom dashboard. By default, the cohort of users are those that installed 30 - 40 days ago from today.

Query

SELECT CONCAT("Day " , retentionchart.retainedday) AS ktlabel_string, round((retentionchart.retainedusers/ installcount.installedusers)*100,0) AS retention_percentage
FROM
(
SELECT retentionset.retainedday, COUNT(*) AS retainedusers  
FROM
        (
        SELECT install.s,install.joinday, DATEDIFF(usage.playday, install.joinday) AS retainedday
        FROM
            (
            SELECT s,MIN(FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd')) AS joinday
            FROM ##APP1_apa##
            GROUP BY s          
            --HAVING MIN(FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd')) BETWEEN '2014-06-01' AND '2014-06-07'
            HAVING DATEDIFF(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), MIN(FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd'))) BETWEEN 30 AND 40
            )install
        LEFT OUTER JOIN
            (
                SELECT s,(FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd')) AS playday
                FROM ##APP1_pgr##
                WHERE month IN ($(months_ago(0)),$(months_ago(1))) 
                GROUP BY s, (FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd'))
            )usage
        -- This line gives us retention from day 0 to day 30. If you adjust this, make sure to change the "WHERE month IN" above to include the approriate month as well. Currently we have data fom 'this' and 'last' month, which should cover 30 days of engagement data     
        ON install.s=usage.s WHERE DATEDIFF(usage.playday, install.joinday) BETWEEN 0 AND 30
        )retentionset
GROUP BY retentionset.retainedday
)retentionchart
JOIN
(
    SELECT COUNT(*) AS installedusers
    FROM
    (
    SELECT s,MIN(FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd')) AS joinday
    FROM ##APP1_apa##
    GROUP BY s
    --HAVING MIN(FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd')) BETWEEN '2014-06-01' AND '2014-06-07'
    HAVING DATEDIFF(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), MIN(FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd'))) BETWEEN 30 AND 40
    )installs
)installcount
ON 1=1

Sample Return

ktlabel_stringretention_percentage
Day 0100
Day 117
Day 213
Day 312
Day 411
Day 510
Day 610
Day 710
Day 89
Day 98
Day 108
Day 117
Day 127
Day 137
Day 147
Day 157
Day 166
Day 176
Day 186
Day 196
Day 206
Day 216
Day 225
Day 235
Day 245
Day 255
Day 265
Day 275
Day 285
Day 295
Day 304

What does this mean?

  • In the sample return we see that of all the users that have installed 30 to 40 ago. 17% came back on day 1, 13 % came back on day 2, 5 % came back on day 25 etc.

Standard Queries

Day x Retention Approximation


Description

This query will approximate the retention for a particular app on a specified day.

Query

SELECT COUNT(dayx.s) AS installed, COUNT(dayx1.s) as retainedusers
FROM (
  SELECT s, MIN(utc_timestamp) as firstinstall
  FROM ##APP1_apa## apa
  GROUP BY s
  HAVING FROM_UNIXTIME(firstinstall, 'yyyy-MM-dd') =  ##"YYYY-MM-D1"##
) dayx
LEFT OUTER JOIN (
  SELECT DISTINCT(usageset.s)
  FROM (
    SELECT DISTINCT(s)
    FROM ##APP1_pgr##
    WHERE month >= ##month_in_YYYYMM_format##
    AND FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') = ##"YYYY-MM-D2"##
    UNION ALL 
    SELECT DISTINCT(s)
    FROM ##APP1_evt##
    WHERE month >= ##month_in_YYYYMM_format##
    AND FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') = ##"YYYY-MM-D2"##
  ) usageset
) dayx1
ON dayx.s = dayx1.s

Sample Return

installedretainedusers
352

What does this mean?

  • The subquery named dayx returns the number of users that installed on the date specified in ##"YYYY-MM-D1"##.
  • The subquery named dayx1 used your app again on the date specified in ##"YYYY-MM-D2"##.
  • The LEFT OUTER JOIN returns only the users from the dayx1 subquery that are present in the dayx subquery.

Standard Queries

Hourly Revenue


Description

This query will return the total revenue generated by hour for a given day.

Query

SELECT FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') AS day, HOUR(FROM_UNIXTIME(utc_timestamp)) AS hour_of_day, SUM(v)
AS revenue_per_hour_cents
FROM ##APP1_mtu##
WHERE month = ##month_in_YYYYMM_format##
AND FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') = ##"day_in_YYYY_MM-DD-format"##
GROUP BY FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd'), HOUR(FROM_UNIXTIME(utc_timestamp))

Sample Return

dayhour_of_dayrevenue_per_hour_cents
2013-03-0518100
2013-03-0519210
2013-03-0520230
2013-03-0521280

What does this mean?

  • The SUM function returns the total of all the values in the v field of whatever table you replace ##APP1_mtu## with.
  • The WHERE clause limits the records being counted to whatever month you replace ##month_in_YYYYMM_format## with. For example, 201303 would return the sum of values from March 2013.
  • The GROUP BY command assures that the values are only added together when they share the same date and hour. The date is determined using the UDF FROM_UNIXTIME. The combination of HOUR() and FROM_UNIXTIME() converts Epoch Timestamp to an hour format.

Standard Queries

Install Date By User


Description

This query will return the install date for every user. This is determined by the first APA received from each user.

Query

SELECT s AS user_id, MIN(FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd')) AS first_install
FROM ##APP1_apa##
GROUP BY s
ORDER BY first_install ASC

Sample Return

user_idfirst_install
1459672012-01-22
1458292012-01-22
1458972012-01-22
1460382012-01-22
1460292012-01-22

What does this mean?

  • The query returns all the User IDs in the field s.
  • The GROUP BY command groups by unique values in the s field.
  • The MIN function returns the smallest value in the utc_timestamp field.
  • This returns the smallest, and therefore oldest, timestamp in the utc_timestamp column.

This is useful for determining when a user first installed your application. It prevents unexpected results due to users installing your application multiple times.

Standard Queries

Joining APA and UCC For Mobile Applications


Description

There is a potential for mobile applications in Kontagent to have multiple su tags in each APA message. These su tags are used to do acquisition attribution. Below we have some sample output of what your mobile APA table might look like in Datamine:

Query

SELECT *
FROM testapp_apa
WHERE month = 201304
Sample Return
#utc_timestamptypesusujson_datatsmonth
11366084373apa1779574484d821eb30,5ad9ec84,4a44b1451366084373201304
21366090038apa1762741785bf202bf7,fd55006b,8e9b773f1366090038201304
31366077734apa7954089243a485ae72,83cc82cc,1c5fe5b81366077733201304

Query Use Case

To pair these users to the ad channel that they came from we want to join the APA table and the UCC table on the su column. As shown above, the su tags are stored in a list in the APA table. However, there is only one su tag in each UCC message. In order for us to match the su tags between the APA and UCC table, we must separate the su tags in APA which can be done with LATERAL VIEW. Below is an example.

SELECT s,split_su
FROM testapp_apa
LATERAL VIEW explode(split(su, ','))t1 as split_su
WHERE month = 201304
#ssplit_su
11779574484d821eb30
217795744845ad9ec84
317795744844a44b145
41762741785bf202bf7
51762741785fd55006b
617627417858e9b773f

This means we have to adjust the way we look at acquisition attribution in Datamine to the following query:

-- For every user that installed on April 22, it will give us his/her corresponding ad click information (if applicable)
SELECT apa.s,apa.split_su,ucc.st1,ucc.st2,ucc.st3
FROM
    (SELECT s,split_su
    FROM testapp_apa 
    LATERAL VIEW explode(split(su, ','))t1 as split_su
    WHERE month = 201304 and FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') = "2013-04-22")apa
LEFT OUTER JOIN
    (SELECT *
    FROM testapp_ucc 
    WHERE month = 201304
    )ucc
ON apa.split_su=ucc.su
WHERE ucc.su IS NOT NULL

The above query will attempt to match the su tags in our APA table (user installs) to the su tags in our UCC table. When it finds a match, it will give us the specific user id, the matching su tag and the subtypes of the ad that was clicked.

#ssplit_sust1st2st3
117795744845ad9ec84CampaignACategory3Age24to40
21762741785bf202bf7CampaignACategory2Age40to45
379540892431c5fe5b8CampaignCCategory2Age24to40

In the above result we have: For every user that install through an ad source, the matching su tag and information about the ad click.

Standard Queries

Last 3 Events Before Users Leave The Application - Mobile


Description

This query will return the last 3 actions users take before leaving the application. The intent of analyzing this data is to try predict which flows of actions lead to loss of engagement with users. This query will return a table with 4 fields:

  • User ID
  • Last action before leaving
  • Second last action before leaving -Third last action before leaving

Definitions and settings

  • A user that has left the application is defined as someone whose last interaction with the application is 14 days or more before today, but have interacted with the application in the last month (this month and last month). You may change these variables from the query.

    Note The query will become slower when more months are added.

  • We are assuming a standard where the application sends some sort of heartbeat message when the user is using the application with a pgr message. If you have a custom instrumentation, this may not work.

  • We are extracting the last three events for each user. The first line of the query can be modified to extract more or events as necessary.

Query

SELECT getLastThree.s, 
getLastThree.evtList[0] AS last_event, 
getLastThree.evtList[1] AS second_last_event, 
getLastThree.evtList[2] AS third_last_event
FROM 
   (
    SELECT getEvtList.s, kt_collectall(getEvtList.event) AS evtList
    FROM 
       (
       SELECT apr.s, evt.ts, CONCAT_WS("/", evt.st1, evt.st2, evt.st3, evt.n) AS event
       FROM 
          (
          -- This inner query gives us the user id as well as the time stamp of users that have used the application this and last month
          -- But have not used the application in the past 14 days
          SELECT s, MAX(ts) AS ts
          FROM ##APP1_pgr##
          -- This line picks events and users who have used the app this month and last month
          WHERE month IN($(months_ago(0)),$(months_ago(1))) 
          -- This line keeps messages with correct timestamps, where the time the event occured is not ahead of the time we received the message
          AND CAST(ts AS bigint) <= utc_timestamp
          GROUP BY s
          -- This line denotes someone who has left the application as: at least 14 days of inactivity from today.
          HAVING DATEDIFF(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd')) >= 14
         ) apr
      JOIN 
         (    
          -- This query grabs the last 10 events for each user.
          -- We are grabbing 10 because some may not fit the JOIN requirements later on.
          SELECT innerevt.*
          FROM
             (
             SELECT s, ts, st1, st2, st3, n, ROW_NUMBER () OVER (PARTITION BY s ORDER BY ts DESC) AS ranks
             FROM ##APP1_evt##
             -- This line picks events that have happened this month and last month
             WHERE month IN($(months_ago(0)),$(months_ago(1)))
             AND CAST(ts AS bigint) <= utc_timestamp
             )innerevt
          WHERE ranks <= 10 
         ) evt
      ON apr.s = evt.s
      WHERE evt.ts < apr.ts
      ORDER BY evt.ts DESC
      ) getEvtList
   GROUP BY getEvtList.s
   ) getLastThree

Sample Return

slast_eventsecond_last_eventthird_last_event
4645654138747/cmd/Building/Farm/Off130036/cmd/Building/Food/Fabricator132577/cmd/Building/Fusion/Generator
756756987135051/cmd/Building/Space/Cafe131610/cmd/Building/Space/Washroom136700/cmd/Building/Space/Bar
234345345137125/cmd/Building/Food/Fabricator132412/cmd/Building/Update/Port133134/cmd/Survey/New/3

Additionally, we can alter the above query to find out which three sets of events causes more users to leave the application.

SELECT getLastThree.evtList[0] AS last_event, 
getLastThree.evtList[1] AS second_last_event, 
getLastThree.evtList[2] AS third_last_event, 
COUNT (getLastThree.s) AS total_users
FROM 
   (
    SELECT getEvtList.s, kt_collectall(getEvtList.event) AS evtList
    FROM 
       (
       SELECT apr.s, evt.ts, CONCAT_WS("/", evt.st1, evt.st2, evt.st3, evt.n) AS event
       FROM 
          (
          -- This inner query gives us the user id as well as the time stamp of users that have used the application this and last month
          -- But have not used the application in the past 14 days
          SELECT s, MAX(ts) AS ts
          FROM ##APP1_pgr##
          -- This line picks events and users who have used the app this month and last month
          WHERE month IN($(months_ago(0)),$(months_ago(1))) 
          -- This line keeps messages with correct timestamps, where the time the event occured is not ahead of the time we received the message
          AND ABS(utc_timestamp - CAST(ts AS bigint)) <= 604800
          AND CAST(ts AS bigint) <= utc_timestamp
          GROUP BY s
          -- This line denotes someone who has left the application as: at least 14 days of inactivity from today.
          HAVING DATEDIFF(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), FROM_UNIXTIME(CAST(ts AS bigint), 'yyyy-MM-dd')) >= 14
         ) apr
      JOIN 
         (    
          -- This query grabs the last 10 events for each user.
          -- We are grabbing 10 because some may not fit the JOIN requirements later on.
          SELECT innerevt.*
          FROM
             (
             SELECT s, ts, st1, st2, st3, n, ROW_NUMBER () OVER (PARTITION BY s ORDER BY ts DESC) AS ranks
             FROM ##APP1_evt##
             -- This line picks events that have happened this month and last month
             WHERE month IN($(months_ago(0)),$(months_ago(1)))
             AND ABS(utc_timestamp-CAST(ts AS bigint)) <= 604800
             AND CAST(ts AS bigint) <= utc_timestamp
             )innerevt
          WHERE ranks <= 10 
         ) evt
      ON apr.s = evt.s
      WHERE evt.ts < apr.ts
      ORDER BY evt.ts DESC
      ) getEvtList
   GROUP BY getEvtList.s
   ) getLastThree
GROUP BY getLastThree.evtList[0], 
getLastThree.evtList[1], 
getLastThree.evtList[2]
ORDER BY total_users DESC
last_eventsecond_last_eventthird_last_eventtotal_users
138747/cmd/Building/Farm/Off130036/cmd/Building/Food/Fabricator132577/cmd/Building/Fusion/Generator3567
135051/cmd/Building/Space/Cafe131610/cmd/Building/Space/Washroom136700/cmd/Building/Space/Bar456
137125/cmd/Building/Food/Fabricator132412/cmd/Building/Update/Port133134/cmd/Survey/New/387

Standard Queries

Last 3 Events Before Uninstall Event From User - Social


Description

This query will show you the last 3 actions a user took before sending an application remove message. The intent of analysing this data is to try to predict which flows of actions lead to loss of engagement with users. This query will return a table with 5 columns:

  1. Row Number
  2. User ID
  3. Last action before leaving
  4. Second last action before leaving
  5. Third last action before leaving

Note that in this query, we have added a limitation that the last 3 events before uninstall and the uninstall must have happened on the same day. This increases the efficiency of the query dramatically for large applications without compromising much data, because we usually expect players to interactive with the application in someway before uninstalling.

For smaller applications, you may remove the (FROM_UNIXTIME(apr.utc_timestamp, 'yyyy-MM-dd') = FROM_UNIXTIME(evt.utc_timestamp, 'yyyy-MM-dd')) of the JOIN CLAUSE.

Query

SELECT getLastThree.s, getLastThree.evtList[0], getLastThree.evtList[1], getLastThree.evtList[2]
FROM (
      SELECT getEvtList.s, kt_collectall(getEvtList.event) AS evtList
      FROM (
           SELECT apr.s, evt.utc_timestamp, CONCAT_WS("/", CAST(evt.utc_timestamp AS string), evt.st1, evt.st2, evt.st3, evt.n) AS event
           FROM (
                SELECT s,max(utc_timestamp) AS utc_timestamp
                FROM demo_apr
                WHERE month = 201210
                GROUP BY s
                ) apr

           JOIN (
                SELECT s,utc_timestamp,st1,st2,st3,n
                FROM demo_evt
                WHERE month = 201210
                ) evt
                ON (apr.s = evt.s) AND (FROM_UNIXTIME(apr.utc_timestamp, 'yyyy-MM-dd') = FROM_UNIXTIME(evt.utc_timestamp, 'yyyy-MM-dd'))
           WHERE evt.utc_timestamp < apr.utc_timestamp
           ORDER BY evt.utc_timestamp DESC
           ) getEvtList
      GROUP BY getEvtList.s
     )getLastThree

Sample Return

#sevent1event2event3
146456541309238747/command/Building/Farm/Off1307840036/command/Building/Food/Fabricator1309472577/command/Building/Fusion/Generator
27567569871309055051/command/Building/Space/Cafe1309141610/command/Building/Space/Washroom1309496700/command/Building/Space/Bar
32343453451311297125/command/Building/Food/Fabricator1308102412/command/Building/Update/Port1308973134/command/Survey/New/3

Query Use Case

This query can show you where users are leaving the application and why. Using this information, it may be possible to hypothesize on ways to mitigate the user loss if they take a certain path, or prevent the user from going down paths that are known to lead to leaving the application entirely. This query requires moderate-advanced knowledge of SQL and moderate knowledge of a spreadsheet program, such as Excel.

The results of this query are best suited to visualization through a PivotTable, such as those found in Excel. For an example of this visualization, please view the webinar on kSuite DataMine here.

Best Practices Observed In This Query

  1. Strong naming convention present. Every column is given an explicit name and will be easily understood even after exporting.
  2. There is also an explicit JOIN...ON in the query, with parentheses to show order of joining. This helps to both speed up the query and ensure that only the correct results are returned.

Standard Queries

Monthly Active Users Approximation


Description

This query will approximate the MAU (Monthly Active Users) for an application for a given month.

Query

SELECT COUNT (DISTINCT(mauset.s))
FROM
    (SELECT DISTINCT(s)
    FROM ##APP1_pgr##
    WHERE month = ##month_in_YYYYMM_format##
    union all
    SELECT DISTINCT(s)
    FROM ##APP1_evt##
    WHERE month = ##month_in_YYYYMM_format##
    union all
    SELECT DISTINCT(s)
    FROM ##APP1_apa##
    WHERE month = ##month_in_YYYYMM_format##)mauset

Sample Return

_c0
100

What does this mean?

  • The statement queries three tables (##APP1_pgr##, ##APP1_evt##, and ##APP1_apa##).
  • The WHERE clause limits the records being counted to whatever month you replace ##month_in_YYYYMM_format## with. For example 201303 would return the sum of values from March 2013.
  • The DISTINCT keyword filters out any records that duplicate the s field.
  • The union all command combines the 3 queries into one result.
  • The outermost DISTINCT command removes User IDs that are in more than one table.
  • The COUNT function returns the number of results.
  • The value returned is the number of User IDs that are in any of the pgr, evt, or apa tables. These tables were chosen because they are the most common messages so if a user were to use your application, they would most likely send at least one of these messages. This is a quick way to approximate the monthly active uniques for a given month.

Standard Queries

Monthly Revenue


Description

This query will return the total revenue generated for a given month.

Query

SELECT SUM(v) AS monthly_revenue_cents
FROM ##APP1_mtu##
WHERE month = ##month_in_YYYYMM_format##

Sample Return

monthly_revenue_cents
20000

What does this mean?

  • The SUM function returns the total of all the values in the v field of whatever table you replace ##APP1_mtu## with.
  • It only counts the records that have the value you choose in the in the month field. For example, 201303 would return records from March 2013.
  • This query returns the revenue in cents. To display this number in dollar amount, the SELECT statement can be replaced with SELECT SUM(v) / 100 AS monthly_revenue_dollars to divide the value it returns by 100.

Standard Queries

Session Frequency by Users


Description

This query will return the number of users that had exactly 1 session, 2 sessions, 3 sessions, etc. in the specified month.

The result can be visualized on a pie chart to give us the % of users that had exactly x sessions.

Query

  • Replace ##APP1_pgr##, ##APP1_evt## with the name of your applications you are interested in reviewing, for example: demogame_pgr, demogame_evt.
  • Replace ##month_in_YYYYMM_format## with the month you are interested in querying, for example 201303. For a dynamic month use the following syntax: month = from_unixtime(unix_timestamp(),'yyyyMM') to look at the current month.
  • Replace ##timeout_in_minutes_integer## with the desired timeout, for example a value of 2 would mean that when a user is inactive for 2 minutes or more, the session would end.
  • For mobile applications we recommend setting the timeout to 2 minutes and for web applications we recommend using 30 minutes.
    SELECT session_aggregate.app_frequency AS number_sessions, COUNT(session_aggregate.s) AS users
        FROM
            (
            SELECT session_info.s, MAX(session_info.session_number) AS app_frequency
            FROM
                (
                SELECT kt_session(calc_session.s, calc_session.evt_lst,##timeout_in_minutes_integer##) AS (s, session_number, session_length)
                FROM 
                    (
                    SELECT session_set.s, collect_set(session_set.timestamps) evt_lst
                    FROM 
                        (
                        SELECT total_list.s, total_list.timestamps
                        FROM 
                            (
                            SELECT pgr.s, pgr.utc_timestamp AS timestamps
                            FROM ##APP1_pgr## pgr
                            WHERE month= ##month_in_YYYYMM_format##
                            and s is not null
                            UNION ALL
                            SELECT evt.s, evt.utc_timestamp AS timestamps
                            FROM ##APP1_evt## evt
                            WHERE month= ##month_in_YYYYMM_format##
                            and s is not null
                            ) total_list 
                        )session_set
                    GROUP BY session_set.s
                    ) calc_session 
                ORDER BY s,session_number DESC
                )session_info
            GROUP BY session_info.s
            )session_aggregate
        GROUP BY session_aggregate.app_frequency
    

Sample Return

number_sessionsusers
1219852
276402
3129867
414426
58116
65139
73592
82538
92082
101620
111412
121210
13960
14832
15776
16658
17601
18520
19513
20445
..
..
..

What does this mean?

We can then use a graphing program such as Microsoft EXCEL to visualize the results in a pie chart. In this example, we can clearly see that majority of our active users have spent only 1 session using the app this month. While 20% of our active users spent exactly 2 sessions.

After we bucket the raw data we have this table:

number_sessionsuserspercentage
121985258%
27640220%
3 - 44429312%
5 - 7168474%
7 - 1098323%
11 - 2079272%
20 +66122%

Session Frequency User Distribution Pie Chart

Standard Queries

Time to First Monetization


Description

This query will look at paying users from a certain date and return a distribution of users and the time it took for them to monetize for the first time since installing the application.

  • Distribution of users and the time it took for them to monetization for the first time since installing the application, this query will look at paying users from all time.
  • Replace all text enclosed with ## (example ##"YYYY-MM-D1"## could be 2014-03-01) using the following instructions:
  • Replace ##APP1_mtu## and ##APP1_apa## with the name of your applications you are interested in reviewing, for example: demogame_mtu.
  • The HAVING clause will look at only users who have installed the application within a certain date.
  • IMPORTANT: Make sure to only have one of the HAVING clauses.

Query

SELECT first_install.s,
FROM_UNIXTIME(first_install.firstinstall, 'yyyy-MM-dd') as firstinstall,
FROM_UNIXTIME(first_monetization.first_monetize_time, 'yyyy-MM-dd') as first_monetization_time,
ROUND((first_monetization.first_monetize_time-first_install.firstinstall)/86400,1) AS time_to_1st_monetize_days
FROM
    (
    SELECT s, MIN(utc_timestamp) as firstinstall
    FROM ##APP1_apa apa##
    GROUP BY s  
    -- Users who have installed on a specific date
    HAVING FROM_UNIXTIME(firstinstall, 'yyyy-MM-dd') =  ##"YYYY-MM-D1"##
    -- Users who have installed between the dates YYYY-MM-D1 and YYYY-MM-D2
    --HAVING FROM_UNIXTIME(firstinstall, 'yyyy-MM-dd') BETWEEN ##"YYYY-MM-D1"## AND ##"YYYY-MM-D2"##
    -- Users who have installed 30 to 40 days ago, can be changed to any arbitrary integer
    -- This will be dynamic, useful for when you sechdule this query.
    --HAVING DATEDIFF(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), FROM_UNIXTIME(firstinstall, 'yyyy-MM-dd')) BETWEEN 30 AND 40
    ) first_install
JOIN
    (
    SELECT s, MIN (utc_timestamp) AS first_monetize_time
    FROM ##APP1_mtu mtu##
    GROUP BY s
    )first_monetization
ON first_install.s = first_monetization.s
WHERE ROUND((first_monetization.first_monetize_time-first_install.firstinstall)/86400,1) >=0
ORDER BY time_to_1st_monetize_days ASC

Sample Return

sinstall_timefirst_monetization_timetime_to_1st_monetize_days
66503397695622700002013-05-112013-05-121.0
90251338189702013-05-152013-05-194.0
53017148279512013-05-052013-05-115.7

What does this mean?

  • The first nested SELECT statement returns the User IDs and earliest timestamps of user that that installed your app. This is named first_install.
  • The second nested SELECT statement returns the User IDs and earliest timestamps of users that that monetized on your app. This is named first_monetization.
  • The JOIN only returns all the results of the first_monetization query only if the users are also in the first_install results.
  • The outermost arguments rounds the time to first monetization to a length of days, and sorts it in order from quickest to monetize to slowest to monetize.

Standard Queries

Timestamp Transformation


Description

This query will transform the unix timestamp into days of the week, such as 'Monday', 'Tuesday', etc.

Query

SELECT from_unixtime(utc_timestamp , 'EEEE') FROM ##APP1_apa## WHERE month = ##month_in_YYYYMM_format##

Sample Return

_c0
Monday
Monday
Friday
Friday

What does this mean?

  • The function from_unixtime will return the day of the week if the first parameter is a Unix Epoch Timestamp, and the second value is the string EEEE.
  • Although this query is not practical when used by itself, when used in conjunction with other queries, it makes more easily readable results.

Standard Queries

Top 1% Monetizing Users


Description

This is a small query to discover what the threshold of spending is (on a per week basis) for a user to enter the top 1% of monetizing users. The query will return a table with 3 columns:

  1. Row Number
  2. Calendar Week
  3. Spending Threshold (in cents)
    SELECT result.CalendarWeek, PERCENTILE(result.sumvalues, 0.99) AS Top1Percent 
    FROM ( 
        SELECT s , 
        SUM(V) AS sumvalues , 
        weekofyear(from_unixtime(utc_timestamp)) AS CalendarWeek 
        FROM demo_mtu 
        WHERE month = 201107 
        GROUP BY s, weekofyear(from_unixtime(utc_timestamp)) 
        ) result 
    GROUP BY result.CalendarWeek
    

Sample Return

#calendar weekspending threshold
122124675.0
223346545.99
324869767.99

Best Practices Observed In This Query

  1. First, there is a date restriction on the month column. This will speed up the query by making sure that it is not looking at the entire dataset.
  2. Good naming conventions are used throughout the query. For example, the column Top1Percent clearly describes the content of the column. If the column that results from an aggregation step such as a PERCENTILE function is not named, the column would be given an automatically generated name.

Standard Queries

Total Time Spent and Number of Users


Description

This query will bucket the number of users based on how much time they have spent using the application.

The default time buckets are:

  • 200 + minutes
  • 100 to 200 minutes
  • 50 to 99 minutes
  • 20 to 49 minutes
  • 10 to 19 minutes
  • 5 to 9 minutes
  • less than 5 minutes

The buckets can be changed in the query.

Query

Replace all text enclosed with ## ## (example ##month_in_YYYYMM_format## could be 201303) using the following instructions:

  • Replace ##APP1_pgr##, ##APP1_evt## with the name of your applications you are interested in reviewing, for example: demogame_pgr, demogame_evt
  • Replace ##month_in_YYYYMM_format## with the month you are interested in querying, for example 201303
  • Replace ##timeout_in_minutes_integer## with the desired timeout, for example a value of 2 would mean that when a user is inactive for 2 minutes or more, the session would end
  • For mobile applications we recommend setting the timeout to 2 minutes and for web applications we recommend using 30 minutes.
    SELECT session_aggregate.app_timespent AS time_spent_bucket, COUNT(*) AS users
    FROM
        (
        SELECT session_info.s, 
        case when SUM(session_info.session_length)/60 > 200 then "200+" 
        when SUM(session_info.session_length)/60 >= 100 then "100 <-> 200" 
        when SUM(session_info.session_length)/60 >= 50 then "50 <-> 99"
        when SUM(session_info.session_length)/60 >= 20 then "20 <-> 49"
        when SUM(session_info.session_length)/60 >= 10 then "10 <-> 19"
        when SUM(session_info.session_length)/60 >= 5 then "5 <-> 9"
        else "<5" end AS app_timespent
        FROM
            (
            SELECT kt_session(calc_session.s, calc_session.evt_lst,##timeout_in_minutes_integer##) AS (s, session_number, session_length)
            FROM 
                (
                SELECT session_set.s, collect_set(session_set.timestamps) evt_lst
                FROM 
                    (
                    SELECT total_list.s, total_list.timestamps
                    FROM 
                        (
                        SELECT s, utc_timestamp AS timestamps
                        FROM ##APP1_pgr##
                        WHERE month = ##month_in_YYYYMM_format##
                        and s is not null
                        UNION ALL
                        SELECT s, utc_timestamp AS timestamps
                        FROM ##APP1_evt##
                        WHERE month = ##month_in_YYYYMM_format##
                        and s is not null
                        ) total_list 
                    )session_set
                GROUP BY session_set.s
                ) calc_session 
            ORDER BY s,session_number DESC
            )session_info
        GROUP BY session_info.s
        )session_aggregate
    GROUP BY session_aggregate.app_timespent
    ORDER BY time_spent_bucket
    

Sample Return

time_spent_bucketusers
200+17063
100 <-> 20014033
50 <-> 9926540
20 <-> 4950505
10 <-> 1948366
5 <-> 950941
< 5236763

What does this mean?

What we are seeing here is how much time does the average user spend using the application and their engagement level.

In this sample result, we see that majority of our users spend less than 5 minutes in the application, perhaps we can think of ways to make the application more sticky for the first 5 minutes. At the same time, there seems to be a small group of users who are really engaged with the application, perhaps we can find out why these players are more engaged or reward these players.

Total time (in minutes) spent in app distribution pie chart

Standard Queries

Working with Regular Expressions


Description

In this example, we are going to use regular expressions and Data Mine to remove messages that are wrongly formatted.

The Upsight Analytics API requires that a User ID be an "unsigned 63-bit int", and for mobile the User ID must be an "unsigned 64-bit int". What if during integration, wrongly formatted User IDs were sent? We can use Data Mine to exclude these messages when calculating our metrics.

Let's say you are perusing your APA messages with the following query:

SELECT s
FROM demo_apa
WHERE month = 201303
and you find that some of the User IDs are bad.

Sample Return

#s
103654643423
2ae235430934

It is important to note that the sample result above is 'bad' because:

  • The first result (03654643423) started with a '0'
  • The second result (ae235430934) contains alphabetical characters

In Data Mine, we can filter out these User ID's with a regular expression:

SELECT s
FROM demo_apa
WHERE month = 201303
AND s RLIKE '[^a-zA-Z]' 
AND s RLIKE '^[1-9][0-9]*$'

Here is an explanation for the query above:

  • Bad User IDs are removed. Only numeric User IDs that do not begin with '0' are accepted
  • [^a-zA-Z] removes all User IDs that contains letters
  • ^[1-9][0-9]*$ removes all User IDs that start with a 0

For more information on regular expressions, visit this page: http://en.wikipedia.org/wiki/Regular_expression

Standard Queries

Working with the JSON Data Column


Description

Each API call has an optional data parameter. The data parameter is a JSON object string representing a dictionary or map of key-value pairs. It must be base64-encoded.

Here is an example of a JSON data object:

{"username": "testuser@companyx.com", "whale": "true", "ingame_currency": 5000, "weapon": "hatchet", "completed_tutorial": "true"}

Here is an example query and result showing how the JSON data gets displayed in Datamine:

SELECT * 
FROM demo_evt
WHERE month = 201210
>
#utc_timestamptypesnvlst1st2st3json_datamonth
11350936966evt13139action00Homeattackattacktype1201210
21350936972evt14130action00reportingdefenddefendtype1201210
31350936980evt13144action00Homeexploreexploretype3201210

Simple Query

What if we wanted to look at all the users who are whales? The Hive function get_json_object() can be used to extract the “whale” key/value pair from the JSON data object. Here is an example query:

SELECT s, get_json_object(evt.json_data, '$.whale') AS whale, json_data 
FROM dashboard_evt evt 
WHERE month = 201210

Simple Query Return

#swhalejson_data
113139true
214130true
313144false

Preferred Query

With the data parameter, we can add arbitrary tags to messages to allow for custom cohorting. Let’s say you have a mobile farming game where a few of your users account for a large portion of you revenue. If you tag these users accordingly, you can view the current in-game currency of those “whales” with the following query:

SELECT s, get_json_object(evt.json_data, '$.ingame_currency') AS ingame_currency, get_json_object(evt.json_data, '$.whale') AS whale
FROM dashboard_evt evt 
WHERE month = 201210 AND get_json_object(evt.json_data, '$.whale') = "true"
>

Preferred Query Return

#singame_currencywhale
1131395000true
2141308000true

Nested JSONs

If your application's data parameter has a nested json object, and you would like to 'explode' on that data, this can be done using the function get_json_object() as well. Let's go through two examples. The first involves a json object with a nested array as a value and the second involves nested json objects.

Nested Array

For this example, imagine we have a RPG styled game, where the player can have up to 3 items equipped at any given time. We can track these items in the json_data parameter. The parameter can look something like the following: {"first":"Jane","last":"Doe","age":20,"sex":"F","items":["ScepterOfAbyss","StormyShield","Empty"]}

In DataMine, if we would like to have the 3 "items" appear in 3 separate columns, we could use this following query:

SELECT json_data, 
(get_json_object(evt.json_data, '$.items[0]')) as first_key1value,
(get_json_object(evt.json_data, '$.items[1]')) as second_key1value,
(get_json_object(evt.json_data, '$.items[2]')) as third_key1value
FROM testapp_evt evt
WHERE month=201302

The above query leads to the following result set:

#json_datafirst_key1valuesecond_key1valuethird_key1value
1ScepterOfAbyssStormyShield/td>Empty

The function '$.items[0]' allows us to get the first element from the 'items' array, '$.items[1]' for the second element and so on.

Nested object

Now let's consider the situation where we have a nested JSON object like this one: {"first":"Jane","last":"Doe","age":20,"sex":"F","favorites":{"website":"Kontagent","month":"May","food":"seafood"}}

We can modify the get_json_object() function call to grab each inner key/value pair:

SELECT json_data,
get_json_object(evt.json_data, '$.favorites.website'),
get_json_object(evt.json_data, '$.favorites.month'),
get_json_object(evt.json_data, '$.favorites.food')
FROM testapp_evt evt
WHERE month=201302

The above query leads to the following result set:

#json_data_c1_c2_c3
1KontagentMayseafood

The function '$.favorites.website' allows us to get the 'website' element from the 'favorites' array, '$.favorites.food' for the 'food' element and so on.

Standard Queries

Working with JSON Data - Data Column


Description

The data column is a new introduction to DataMine. The information is a replica of the JSON data column. The advantage of using the data column is that the reading stage is significantly faster compared to the json_data column.

Example

Instead of using:

get_json_object(json_data, '$.balance')

We can translate it into:

data['balance']

old json_data column:

{"b":"RANDOM-INTERNAL-ID","balance":"1836","ispaying":"true"}   

new data column:

balance:1836;b:RANDOM-INTERNAL-ID;ispaying:true

Note

  • The data column does not work for nested JSONs nor for invalid JSONs. An invalid JSON is a JSON that fails to parse, for example due to incorrect encoding. In such situations, the data column will remain blank.

  • For a nested JSON, the nested object will be turned into a string. Example: {"first":"Jane","last":"Doe","age":20,"sex":"F","favorites":{"website":"Kontagent","month":"May","food":"seafood"}}, the array for "favorites" will be turned into a string and you will not be able to extract specific items from this array.

  • The data column is a string:string map. You will need to do type conversion if you require types.

Standard Queries

Revenue by Country


Description

This query will approximate the total revenue generated by each country for a given month.

Sample Return

countrytotalusers
AR2
AT12
AU3
BE2
BR19

Instructions

  • Replace all text enclosed with ## ## (example ##month_in_YYYYMM_format## could be 201303) using the following instructions:
  • Replace ##APP1_pgr##, ##APP1_mtu## with the name of your applications you are interested in reviewing, for example: demogame_pgr, demogame_mtu
  • Replace ##month_in_YYYYMM_format## with the range of time you are interested in. For example: 201408. If you are interested in a time range of multiple month, you can use WHERE month IN (201407,201408) or the $(months_ago(INT)) in our UDF and macro page.
  • We are using the client_ip parameter from the PGR message to determine the location of the user.

Query

SELECT location.country, COUNT(rev.s) AS totalusers,SUM(rev.totalrev)/100 AS totalrevenue
FROM
    (
    SELECT s,SUM(v) AS totalrev 
    FROM ##APP1_mtu## 
    WHERE month=##month_in_YYYYMM_format##
    GROUP BY s
    )rev
JOIN
(
SELECT maxtime.s, kt_ipcountry(MAX(ips.client_ip)) AS country
FROM
        (
        SELECT s,MAX(utc_timestamp) AS maxts 
        FROM ##APP1_pgr## 
        WHERE month=##month_in_YYYYMM_format##
        GROUP BY s
        )maxtime
JOIN
        (
        SELECT s,utc_timestamp,client_ip
        FROM ##APP1_pgr## 
        WHERE month=##month_in_YYYYMM_format##
        )ips
ON maxtime.s=ips.s AND maxtime.maxts=ips.utc_timestamp
GROUP BY maxtime.s
)location
ON rev.s=location.s
GROUP BY location.country
ORDER BY country

Standard Queries

4 Line Custom Retention


Description

This query will allow us to visualize 4 different types of retention on 4 different lines. We can define our retention lines to show Day 1 Retention, Day 13 Retention, Day 55 Retention, or any other day's retention.

Note If we include more months, the query will perform more slowly.

Sample Visualization

4 line retention visualization

Definitions

  • We are assuming this app has a standard implementation where the app sends:
    • a heartbeat message that sends as a PGR when the user is using the app
    • an APA message on the install date of the user
  • In the final result, retention is displayed as a percentage as an integer.
  • The default settings give us:
    • results for users who have installed 2 and 32 days ago, grouped by install date
    • retention for Day 1, Day 7, Day 14, and Day 30

Instructions

  1. The idea of this query is to calculate retention using the formula: Number of users that installed on day x and came back y days after / Number of users that installed on day x. The first part of the query "returned" gives us the first part of our formula while the second part of the query "installs" gives us the second part of the formula. We then perform the division at the very final step at the top.

    Inside the query "returned", we have the query 'apa' and the query 'pgr'; 'apa' gives us the users that have installed within a date range and 'pgr' gives us the users that have used the application within a date range. Their intersection gives us the users that installed on day x and came back y days after.

  2. Change the application name to the app you want to analyze.

    Note There are three places to change, two places are from the PGR table and one is from the APA table

  3. Change the date range of users.

    Note By default, the query looks at users who have installed 2 and 32 days from today.

    Important Be aware when changing the date that if the dates are too recent, there may not be enough data to calculate the retention that you want.

  4. Change the retention.

    Note By default the retentions are day 1, 7, 14, and 30. Referring to the previous step, this is for our re-engagement data. We needed data from 1 and 31 days ago. Data from today are not included because it's likely not complete yet.

    For our earliest user set, which is the users that installed 32 days ago, we need PGR from 31 days ago to know if they came back.

    4a. Before modifying the retentions you want, check that you have the right dates of engagement data. You can do so by changing the 2 numbers from the WHERE clause after the datediff function. In the default, we have engagement data between 1 and 31 days ago. At the same time, update the WHERE month = clause to include the right dates. For 1 and 31 days ago, we need to look at this month, last month, and the month before to safely include all of our data.

    4b. After the date range has been selected, change these to get the retention that you want. Change the variables name such as day_1_retention, day_7_retention to the number you want. Also you have to change the integers to match the correct retention.

    4c. Modify the integers here to include only the retentions you want. For example, if you want day 2, 5, 10, and 19, it should look like WHERE datediff(date2, installdate) in (2,5,10,19).

    Note The retention numbers are rounded to the nearest integer because the visualization tool can currently only display integers.

Query

SELECT returned.installdate as ktlabel_date, 
-- #4b, Change the variables such as day_1_retention 
ROUND((SUM(returned.day_1_retention)/installs.installed_users)*100,0) AS day1retention,
ROUND((SUM(returned.day_7_retention)/installs.installed_users)*100,0 )AS day7retention,
ROUND((SUM(returned.day_14_retention)/installs.installed_users)*100,0 )AS day14retention,
ROUND((SUM(returned.day_30_retention)/installs.installed_users)*100,0 )AS day30retention
FROM
    (
    SELECT apa.s,date2,installdate
    -- #4b, Change the variables such as day_1_retention as well as the integers to match.
    -- For example, day_5_retention would be CASE WHEN datediff(date2, installdate) = 5 THEN 1 ELSE 0 END AS day_5_retention
    ,CASE WHEN datediff(date2, installdate) = 1 THEN 1 ELSE 0 END AS day_1_retention
    ,CASE WHEN datediff(date2, installdate) = 7 THEN 1 ELSE 0 END AS day_7_retention
    ,CASE WHEN datediff(date2, installdate) = 14 THEN 1 ELSE 0 END AS day_14_retention
    ,CASE WHEN datediff(date2, installdate) = 30 THEN 1 ELSE 0 END AS day_30_retention
    FROM 
        (              
        SELECT s, MIN(FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd')) AS installdate
        -- #2 application name
        FROM myapp_apa
        GROUP BY s
        -- #3 date range of users to look at.
        HAVING datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), MIN(FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd'))) BETWEEN 2 AND 32
        )apa
    JOIN
        (
        SELECT s, FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd') AS date2
        -- #2 application name
        FROM myapp_pgr
        -- #4a Change the reengagement date for retention.
        WHERE month in ($(months_ago(0)),$(months_ago(1)),$(months_ago(2))) AND datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), (FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd'))) BETWEEN 1 AND 31
        GROUP BY s, FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd')
        )pgr
    ON apa.s = pgr.s
    -- #4c, modify the integers here to include only the retentions you want.
    WHERE datediff(date2, installdate) in (1,7,14,30)   
    )returned
JOIN
    (
    SELECT installdates.installdate1, COUNT(new_users) as installed_users
    FROM
        (
        SELECT s AS new_users, MIN(FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd')) AS installdate1
        -- #2 application name
        FROM myapp_apa
        GROUP BY s
        -- #3 date range of users to look at.
        HAVING datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), MIN(FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd'))) BETWEEN 2 AND 32
        )installdates
    GROUP BY installdates.installdate1
    )installs
ON returned.installdate=installs.installdate1
GROUP BY returned.installdate,installs.installed_users

Sample Return

ktlabel_dateday1retentionday7retentionday14retentionday30retention
2014-09-1218000
2014-09-1315720
2014-09-14164100
2014-09-1523800

Standard Queries

Monetization Breakdown by Weekly Install Cohort


Description

This query will approximate the total revenue generated by each cohort. Use the following instructions to replace all text enclosed with ## and ##. For example ##APP1_apa## could be demoapp_apa. Replace ##APP1_apa##, ##APP1_mtu## with the name of your applications you are interested in reviewing. For example: demogame_apa, demogame_mtu.

Sample Visualization

Total Revenue per Cohort

ARPU Week Cohort

Optional Changes

  • We have arbitrarily created 8 cohorts of users.

  • The cohort 1this_week_cohort are users that installed from 7 days ago up to yesterday.

  • The cohort 2weeks_ago_cohort are users who installed 14 days ago up to 8 days ago. These date ranges can be modified by you. Note that there are 4 places in the query where the cohorts are defined. The definitions need to be identical, so changes need to be made in all 4 places and the month clause needs to collect the right data as well.

  • Definitions for d1_revenue and so forth are defined with the 4 SUM clauses. They, too, can be updated for different intervals. There is only one place where their definition needs to be updated. However if you give them a new alias, then that needs to be reflected in the rest of the query. Update each of the HAVING and WHERE clauses when necessary.

More explanation will follow on top of each.

Query

SELECT revenuesums.week_cohort as ktlabel_string,
-- The following 4 lines will get us the total revenue for each of the time periods. They can be commented out if you want to visualize ARPU.
revenuesums.total_d1_rev, 
revenuesums.total_d2_7_rev, 
revenuesums.total_d8_30_rev, 
revenuesums.total_d30plus_rev,
-- This will give us the total number of users in the install cohort. Can be used for extra information but should be removed for visualization as it will throw off the scale compared to revenue.
installcohort_daucount.cohortsize,
-- The following 4 lines will get us the ARPU for each of the time periods. They can be commented out if you want to visualize the total revenue.
round(revenuesums.total_d1_rev/installcohort_daucount.cohortsize, 2) as d1_rev_arpu,
round(revenuesums.total_d2_7_rev/installcohort_daucount.cohortsize, 2) as d2_7_rev_arpu,
round(revenuesums.total_d8_30_rev/installcohort_daucount.cohortsize, 2) as d8_30_rev_arpu,
round(revenuesums.total_d30plus_rev/installcohort_daucount.cohortsize, 2) as d30plus_rev_arpu
FROM
    (
-- The following case clauses will define our 8 install cohorts. This definition is repeated 3 more times in this query and should be identical.
-- Our users are broken down into 8 cohorts, based on the date of their install.

    SELECT CASE WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 1 AND 7 THEN '1this_week_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 8 AND 14 THEN '2weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 15 AND 21 THEN '3weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 22 AND 28 THEN '4weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 29 AND 35 THEN '5weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 36 AND 42 THEN '6weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 43 AND 49 THEN '7weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 50 AND 56 THEN '8weeks_ago_cohort' END AS week_cohort
    , SUM(adduprevenue.d1_revenue) AS total_d1_rev
    , SUM(adduprevenue.d2_7_revenue) AS total_d2_7_rev
    , SUM(adduprevenue.d8_30_revenue) AS total_d8_30_rev
    , SUM(adduprevenue.d30plus_revenue) AS total_d30plus_rev
    FROM
        (
        SELECT apa.s,apa.installdate
-- The following 4 `SUM` commands will sort the revenue generated based on how long it was after the user's install date. 
-- If the alias are changed, then that needs to be reflected in the rest of the query.

        ,SUM(CASE WHEN datediff(date2, installdate) = 1 THEN revenue_per_day ELSE 0 END) AS d1_revenue
        ,SUM(CASE WHEN datediff(date2, installdate) BETWEEN 2 AND 7 THEN revenue_per_day ELSE 0 END) AS d2_7_revenue
        ,SUM(CASE WHEN datediff(date2, installdate) BETWEEN 8 AND 30 THEN revenue_per_day ELSE 0 END) AS d8_30_revenue
        ,SUM(CASE WHEN datediff(date2, installdate) > 30 THEN revenue_per_day ELSE 0 END) AS d30plus_revenue
        FROM 
            (
            SELECT s, MIN(FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd')) AS installdate
            FROM ##APP1_apa## 
            GROUP BY s
-- Building on top of the install cohort case clause from earlier. This will only give us users who have installed 1 to 56 days ago. 
-- Update this portion if the definition of install cohort has been changed.
-- The same needs to be updated for the other APA section in the inner query `apa1`

            HAVING datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), MIN(FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd'))) BETWEEN 1 AND 56
            )apa
        JOIN
            (
            SELECT s, FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') AS date2,sum(v) as revenue_per_day
            FROM ##APP1_mtu##
-- building on top of the revenue aliases like `d1_revenue`. This will only give us users who have payed this month until 4 month ago. 
-- Update this portion if the definition of revenue time period has been changed.

            WHERE month in ($(months_ago(0)),$(months_ago(1)),$(months_ago(2)),$(months_ago(3)),$(months_ago(4)))
            GROUP BY s, FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd')
            )pgr
        ON apa.s = pgr.s
        GROUP BY apa.s,apa.installdate
        )adduprevenue
    GROUP BY 
    CASE WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 1 AND 7 THEN '1this_week_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 8 AND 14 THEN '2weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 15 AND 21 THEN '3weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 22 AND 28 THEN '4weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 29 AND 35 THEN '5weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 36 AND 42 THEN '6weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 43 AND 49 THEN '7weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), adduprevenue.installdate) BETWEEN 50 AND 56 THEN '8weeks_ago_cohort' END
    )revenuesums
-- Second part of this query is used to calcualte ARPU which is sum of revenue / number of users. This part will give us the number of users.
-- You can consider removing this if you are only interested in the total revenue generated.
JOIN
    (
    SELECT
    CASE WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 1 AND 7 THEN '1this_week_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 8 AND 14 THEN '2weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 15 AND 21 THEN '3weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 22 AND 28 THEN '4weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 29 AND 35 THEN '5weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 36 AND 42 THEN '6weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 43 AND 49 THEN '7weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 50 AND 56 THEN '8weeks_ago_cohort' END AS week_cohort,count(*) as cohortsize
    FROM
        (
        SELECT s, MIN(FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd')) AS installdate
        FROM ##APP1_apa##
        GROUP BY s
        HAVING datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), MIN(FROM_UNIXTIME(CAST(ts AS BIGINT), 'yyyy-MM-dd'))) BETWEEN 1 AND 56
        )apa1
    GROUP BY 
    CASE WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 1 AND 7 THEN '1this_week_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 8 AND 14 THEN '2weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 15 AND 21 THEN '3weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 22 AND 28 THEN '4weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 29 AND 35 THEN '5weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 36 AND 42 THEN '6weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 43 AND 49 THEN '7weeks_ago_cohort'
    WHEN datediff(FROM_UNIXTIME(unix_timestamp(), 'yyyy-MM-dd'), apa1.installdate) BETWEEN 50 AND 56 THEN '8weeks_ago_cohort' END
    )installcohort_daucount
ON revenuesums.week_cohort = installcohort_daucount.week_cohort

Example Output

ktlabel_stringtotal_d1_revtotal_d2_7_revtotal_d8_30_revtotal_d30plus_revcohortsized1_rev_arpud2_7_rev_arpud8_30_rev_arpud30plus_rev_arpu
1this_week_cohort25096422982900989062.542.3200
2weeks_ago_cohort212024260993780030946702.242.760.820
3weeks_ago_cohort2326102408281424800864002.692.791.650
4weeks_ago_cohort1503022602181993070680912.213.822.930
5weeks_ago_cohort1993413058632491921696911032.193.362.740.02
6weeks_ago_cohort1501061813121567177984755301.992.42.070.11
7weeks_ago_cohort592249346813560041137624520.951.52.170.66
8weeks_ago_cohort6670512270114388259717483331.382.542.981.24

Portfolio Analysis Queries

Monetizing Users


The Question

Of users who are installed and have generated activities in more than one property - which users of which applications generate the highest average revenue per user?

How To Instrument

For every user make sure to send a unique and consistent UID (User Identifier) across your portfolio of applications. In Upsight Analytics this is the “S” parameter. This could be a User ID, a rewards number or any unique user identification number that is consistent and specific to the user throughout your portfolio of applications.

Parameter Type Description
s unsigned 63-bit init The UID of the user adding the application. Note Upsight Analytics only accepts integers up to 63-bits. For information on converting UIDs greater than 63-bits, click here.

How To Query Your Data

Average Revenue Per User With Only One Application

This will return a single number that counts how the average revenue per user (ARPU) if they have installed one of the applications you choose to review but not both.

  1. Application Selection
    • Replace ##APP1_pgr## with the name of the pgr table of the application whose users you would like to review. For example: demogame1_pgr.
    • Replace ##APP2_pgr## with the name of the pgr table of the application whose users you would like to exclude. For example: demogame2_pgr.
  2. Time Period
    • Replace ##date_in_'YYYY-MM-DD'_format## as the initial date of interest using the format 'YYYY-MM-DD', for example '2012-06-01' would mean June 6 2012
    • Replace ##month_in_YYYYMM_format## as the month filter, for example 201206 would mean June 2012.

The above examples would return the ARPU of the users who used demogame1 but not demogame2 in June starting with June 1 2012.

This query is very customizable.

  • If you would like to remove the filter by month you can remove the two lines that start with WHERE to search for results from all time. This is not recommended as it will slow down your query considerably.
  • By default this query will return the users who have installed after the day chosen. By changing the >= to a <= you can return the results for users who have installed before the chosen date.
    SELECT sum(user_revenu.totalrev)/count(user_revenu.install_s) AS ARPU
    FROM (
       SELECT installdata.install_s,revenue.totalrev
       FROM (
          -- select users that have ONLY installed both applications
          SELECT users.s as install_s, collect_set(app) as apps
          FROM (
             SELECT s, "app1" as app
             FROM ##APP1_apa##
             --WHERE month >= ##month_in_YYYYMM_format##
             AND FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') >= ##date_in_'YYYY-MM-DD'_format##
             UNION ALL
             SELECT s, "app2" as app
             FROM ##APP2_apa##
             --WHERE month >= ##month_in_YYYYMM_format##
             AND FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') >= ##date_in_'YYYY-MM-DD'_format##
          )users
          GROUP BY users.s
          HAVING size(apps)=2
       )installdata
       LEFT OUTER JOIN (
          -- Total revenue generated from both applications
          SELECT inner_rev.s, sum(inner_rev.v) AS totalrev
          FROM (
             SELECT s,v
             FROM ##APP1_mtu##
             UNION ALL
             SELECT s, v
             FROM ##APP2_mtu##
          ) inner_rev
          GROUP BY inner_rev.s
       )revenue
       ON installdata.install_s=revenue.s 
    ) user_revenu
    
Average Revenue Per User With Two Applications

This will return a single number that counts how the average revenue per user (ARPU) if they have installed both of the applications you choose to review.

  • Application Selection
    • Replace ##APP1_apa##, ##APP1_mtu## with the name of tables of the first application you are interested in reviewing, for example: demogame1_apa and demogame1_mtu.
  • Replace ##APP2_apa##, ##APP2_mtu## with the name of tables of the first application you are interested in reviewing, for example: demogame2_apa and demogame2_mtu.
  • Time Period
    • Replace ##date_in_'YYYY-MM-DD'_format## as the initial date of interest using the format 'YYYY-MM-DD', for example '2012-06-01' would mean June 1 2012.
    • Replace ##month_in_YYYYMM_format## as the initial date of interest using the format 'YYYY-MM-DD', for example 201206 would mean June 2012.

      SELECT sum(user_revenu.totalrev)/count(user_revenu.install_s) AS ARPU FROM ( SELECT installdata.install_s,revenue.totalrev FROM ( -- select users that have ONLY installed both applications SELECT users.s as install_s, collect_set(app) as apps FROM ( SELECT s, "app1" as app FROM ##APP1_apa## --WHERE month >= ##month_in_YYYYMM_format## AND FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') >= ##date_in_'YYYY-MM-DD'format## UNION ALL SELECT s, "app2" as app FROM ##APP2_apa## --WHERE month >= ##month_in_YYYYMM_format## AND FROM_UNIXTIME(utc_timestamp, 'yyyy-MM-dd') >= ##date_in'YYYY-MM-DD'_format## )users GROUP BY users.s HAVING size(apps)=2 )installdata LEFT OUTER JOIN ( -- Total revenue generated from both applications SELECT inner_rev.s, sum(inner_rev.v) AS totalrev FROM ( SELECT s,v FROM ##APP1_mtu## UNION ALL SELECT s, v FROM ##APP2_mtu## ) inner_rev GROUP BY inner_rev.s )revenue ON installdata.install_s=revenue.s ) user_revenu

How To Visualize The Results

A simple bar chart can give simple at-a-glance insight into where your users are in your ecosystem.

The first query will yield the value in the first bar. If you swap the table you use in app1 with the table you use in app2, the query will return the value in the second bar. The second query will return the value in the third bar.

Application(s)ARPU
Application 1$15
Application 2$10
Application 1 + Application 2$28

Portfolio Analysis Queries

User Installs


The Question

How many unique users have installed two different applications in my portfolio last month? We are looking to understand how many users have installed each application within a particular timeframe, and how many users have installed both applications. We are looking to create a simple Venn Diagram that shows overlap in users between the two applications.

Understanding which properties (applications) users are engaged with in your portfolio of different applications is a more powerful way to understand the health of your portfolio.

How To Instrument

For every user make sure to send a unique and consistent UID (User Identifier) across your portfolio of applications. In Upsight Analytics this is the S parameter. This could be a User ID, a rewards number or any unique user identification number that is consistent and specific to the user throughout your portfolio of applications.

Parameter Type Description
s unisigned 63-bit int The UID of the user adding the application. Note Upsight Analytics only accepts integers up to 63-bits. For information on converting UIDs greater than 63-bits, click here.

How To Query Your Data

Number of Users With Only One Application Installed

This will return a single number that counts the number of users that have downloaded one of the applications you choose to review but not both.

  1. Application Selection

    • Replace ##APA1_apa## with the name of the apa table of the application whose users you would like to review. For example demogame1_apa.
    • Replace ##APA2_apa## with the name of the apa table of the application whose users you would like to exclude. For example demogame2_apa.
  2. Time Period

    • Replace ##lower_date_in"YYY-MM-DD"_format## with the lower bound of interest using the format "YYYY-MM-DD", for example "2013-04-01".
    • Replace ##upper_date_in"YYY-MM-DD"_format## with the upper bound of interest using the format "YYYY-MM-DD", for example "2013-04-30".

The above examples would return the number of users who used demogame1 but not demogame2 in April 2013.

SELECT COUNT(*) 
FROM
    (SELECT s AS app1s, MIN(utc_timestamp) AS app1utc
    FROM ##APP1_apa##
    GROUP BY s) app1

LEFT OUTER JOIN

    (SELECT s AS app2s, MIN(utc_timestamp) AS app2utc
    FROM ##APP2_apa##
    GROUP BY s) app2

ON app1.app1s = app2.app2s
WHERE app2.app2s IS NULL
AND FROM_UNIXTIME(app1.app1utc, 'yyyy-MM-dd')>= ##lower_date_in_"YYYY-MM-DD"_format##
AND FROM_UNIXTIME(app1.app1utc, 'yyyy-MM-dd')<= ##upper_date_in_"YYYY-MM-DD"_format##
Number of Users With Two Applications Installed

This will return a single number that counts the number of users that have downloaded both of the applications you choose to review.

  1. Application Selection
    • Replace ##APA1_apa## with the name of the apa table of the first application you are interested in reviewing. For example demogame1_apa.
    • Replace ##APA2_apa## with the name of the apa table of the second application you are interested in reviewing. For example demogame2_apa.
  2. Time Period
    • Replace ##lower_date_in"YYY-MM-DD"_format## with the lower bound of interest using the format "YYYY-MM-DD", for example "2013-04-01".
    • Replace ##upper_date_in"YYY-MM-DD"_format## with the upper bound of interest using the format "YYYY-MM-DD", for example "2013-04-30".

The above examples would return the number of users who used both demogame1 and demogame2 in April 2013.

SELECT COUNT(*) 
FROM (
   SELECT s AS app1s, min(utc_timestamp) AS app1utc
    FROM ##APP1_apa##
    GROUP BY s) app1
LEFT OUTER JOIN (
   SELECT s AS app2s, min(utc_timestamp) AS app2utc
    FROM ##APP2_apa##
    GROUP BY s) app2
ON app1.app1s = app2.app2s
WHERE app2.app2s IS NOT NULL
AND FROM_UNIXTIME(app1.app1utc, 'yyyy-MM-dd')>= ##lower_date_in_"YYYY-MM-DD"_format##
AND FROM_UNIXTIME(app1.app1utc, 'yyyy-MM-dd')<= ##upper_date_in_"YYYY-MM-DD"_format##
AND FROM_UNIXTIME(app2.app2utc, 'yyyy-MM-dd')>= ##lower_date_in_"YYYY-MM-DD"_format##
AND FROM_UNIXTIME(app2.app2utc, 'yyyy-MM-dd')<= ##upper_date_in_"YYYY-MM-DD"_format##

How To Visualize The Results

A simple Venn Diagram can give simple at-a-glance insight into where your users are in your ecosystem. Additionally, sizing the circles by the number of users in each application gives an easy visual queue to relative user base and overlap in user base.

The first query will yield the number of users in the red circle but not in the intersecting purple area. If you swap the table you use in app1 with the table you use in app2, the query will return the number of users in the blue circle but not the intersecting purple area. The second query will return the number of users in the intersecting purple area.

  # of users who installed % of users installed
Just Application 1 1000 50%
Just Application 2 800 40%
Application 1 + Application 2 200 10%
Unique User Installs2000100%

Portfolio Analysis Queries

User Flow


The Question

Of users who are active or have installed more than one application, in what order did they flow through the portfolio?

Understanding which properties (applications) users are engaged with in your portfolio of different applications is a more powerful way to understand the health of your portfolio.

How To Instrument

For every user make sure to send a unique and consistent UID (User Identifier) across your portfolio of applications. In Upsight Analytics this is the s parameter. This could be a User ID, a rewards number or any unique user identification number that is consistent and specific to the user throughout your portfolio of applications.

Parameter Type Description
s unsigned 63-bit int The UID of the user adding the application. Note Upsight Analytics only accepts integers up to 63-bits. For information on converting UIDs greater than 63-bits, click here.

How To Query Your Data

Number of Users With Two Applications Installed
  • Application Selection
    • Replace ##APP1_apa## and ##APP2_apa## with the name of both applications you are interested in reviewing, for example: demogame1_apa, demogame2_apa.
  • Time Period
    • Replace ##lower_date_in_"YYYY-MM-DD"_format## as the lower bound of interest using the format "YYYY-MM-DD", for example "2013-04-01" would mean April 1 2013.
    • Replace ##upper_date_in_"YYYY-MM-DD"_format## as the upper bound of interest using the format "YYYY-MM-DD", for example "2013-04-30" would mean April 30 2013.
        SELECT COUNT(*) 
        FROM (
           SELECT s AS app1s, MIN(utc_timestamp) AS app1utc
           FROM ##APP1_apa##
           GROUP BY s
        ) app1
        LEFT OUTER JOIN (
           SELECT s AS app2s, MIN(utc_timestamp) AS app2utc
           FROM ##APP2_apa##
           GROUP BY s
        ) app2
        ON app1.app1s = app2.app2s
        WHERE app2.app2s IS NOT NULL
        AND FROM_UNIXTIME(app1.app1utc, 'yyyy-MM-dd')>= ##lower_date_in_"YYYY-MM-DD"_format##
        AND FROM_UNIXTIME(app1.app1utc, 'yyyy-MM-dd')<= ##upper_date_in_"YYYY-MM-DD"_format##
        AND FROM_UNIXTIME(app2.app2utc, 'yyyy-MM-dd')>= ##lower_date_in_"YYYY-MM-DD"_format##
        AND FROM_UNIXTIME(app2.app2utc, 'yyyy-MM-dd')<= ##upper_date_in_"YYYY-MM-DD"_format##
Number of Users Who Installed app1 followed by app2
  • Application Selection
    • Replace ##APP1_apa## with the name of the table of the application you are interested in reviewing first. For example: demogame1_apa.
    • Replace ##APP2_apa## with the name of the table of the application you are interested in reviewing second. For example: demogame2_apa.
  • Time Period
    • Replace ##lower_date_in_"YYYY-MM-DD"_format## as the lower bound of interest using the format "YYYY-MM-DD", for example "2013-04-01" would mean April 1 2013.
    • Replace ##upper_date_in_"YYYY-MM-DD"_format## as the upper bound of interest using the format "YYYY-MM-DD", for example "2013-04-30" would mean April 30 2013.
        SELECT COUNT(*)
        FROM (
           SELECT s AS app1s, MIN(utc_timestamp) AS app1utc
           FROM ##APP1_apa##
           GROUP BY s
        )app1

        JOIN (
           SELECT s AS app2s, MIN(utc_timestamp) AS app2utc
           FROM ##APP2_apa##
           GROUP BY s
        )app2
        ON app1.app1s = app2.app2s
        WHERE app1.app1utc <= app2.app2utc
        AND FROM_UNIXTIME(app1.app1utc, 'yyyy-MM-dd')>= ##lower_date_in_"YYYY-MM-DD"_format##
        AND FROM_UNIXTIME(app1.app1utc, 'yyyy-MM-dd')<= ##upper_date_in_"YYYY-MM-DD"_format##
        AND FROM_UNIXTIME(app2.app2utc, 'yyyy-MM-dd')>= ##lower_date_in_"YYYY-MM-DD"_format##
        AND FROM_UNIXTIME(app2.app2utc, 'yyyy-MM-dd')<= ##upper_date_in_"YYYY-MM-DD"_format##

How To Visualize The Results

Visually representing the area of the circle relative to the % of users who moved from one application and in which direction gives a clear picture of user flow between applications.

The first query will return the value in the third row. The second query will yield the value in the first row. If you swap the table you use in app1 with the table you use in app2, the query will return the value in the second row.

  Number of Users Flowed Percentage of Users Installed
Application 1 first then Application 2 460 46%
Application 2 first then Application 1 540 54%
Application 1 + Application 21000