In this section, we go over some more complicated queries that can answer questions you may have about your applications. These queries may contain multiple sub queries or make use of some DataMine or HIVE-specific functions.
One thing to note is that the queries here may not match metrics on your dashboard as your dashboard metrics may have specific definitions for your applications. In most circumstances, your metrics should be the most correct source of information, though these examples should produce results that are close unless your specific metric differs significantly.
Some of these examples include app_id
in the WHERE
clause and some do not. In general, you will likely want to limit your queries to a specific application or add app_id
to the GROUP BY
to get per-app information.
This query calculates the average revenue per paying user (ARPPU). It sums the total revenue generated from validated purchases made in the month of January 2017 divided by the number of users who made those purchases.
SELECT SUM(revenue) / COUNT(1) as ARPPU
FROM
(
SELECT SUM(upsight_data_value_usd) as revenue, sid
FROM monetization
WHERE month = 201701 and type = ‘upsight.monetization.validated’
GROUP BY sid
)t1
This query calculates the average revenue per daily active user (ARPDAU). For ARPDAU, we need to look at two tables, one with spend data and one with session data. There are many ways to merge these two datasets such as a JOIN
on sid
. Using UNION ALL
is another method that is often more efficient as done in the example below. We GROUP BY
sid
after unioning the two tables to calculate the total revenue each user has spent. Then, we can count the users and sum this revenue to calculate ARPDAU.
SELECT SUM(user_revenue) / COUNT(1) as ARPDAU
FROM
(
SELECT sid, SUM(revenue) as user_revenue
FROM
(
SELECT upsight_data_value_usd as revenue, sid
FROM monetization
WHERE month = 201701 and type = ‘upsight.monetization.validated’
UNION ALL
SELECT 0 as revenue, sid
FROM session
WHERE month = 201701
)t1
GROUP BY sid
)t1
This query counts the number of users who have purchased each individual product over a single day. If a user buys two different items, we want to count that user twice (once for each item). Doing the query with two steps using SELECT DISTINCT
and count(1)
instead of using COUNT DISTINCT
spreads the load more evenly which results in a faster running query.
SELECT upsight_data_product, COUNT(1) as users
FROM
(
SELECT DISTINCT sid, upsight_data_product
FROM monetization
WHERE day = 20170125
) t1
GROUP BY upsight_data_product
This query returns a row for each country and the number of users that have made a purchase in them. Much like how we can not simply sum the number of users each day to get MAU (because a user can play multiple days), here we have to consider the case where a user can spend in two different countries this month.
If we do the simple query, just replacing day by country, we will count that user in BOTH countries. This means you can not simply sum over the results in excel and get a total spender number. In the query below we count each user once by choosing place the user into whatever country they spent most recently. This is just one of many choices we could make, however the benefit is that the total spenders calculated by the query below will also match the number from the query in the previous section.
SELECT location_country_ip as Country, COUNT(1)
FROM
(
SELECT
sid,
Location_country_ip,
row_number() OVER( PARTITION BY sid ORDER BY seq_id ) as rank
FROM monetization
WHERE month = 201701 and type = ‘upsight.monetization.validated’
) t1
WHERE rank = 1
GROUP BY location_country_ip