Intermediate Queries

Overview


In this section, we go over some more complicated queries that can answer questions you may have about your applications. These queries may contain multiple sub queries or make use of some DataMine or HIVE-specific functions.

One thing to note is that the queries here may not match metrics on your dashboard as your dashboard metrics may have specific definitions for your applications. In most circumstances, your metrics should be the most correct source of information, though these examples should produce results that are close unless your specific metric differs significantly.

Some of these examples include app_id in the WHERE clause and some do not. In general, you will likely want to limit your queries to a specific application or add app_id to the GROUP BY to get per-app information.

ARPPU

Intermediate Query


Overview

This query calculates the average revenue per paying user (ARPPU). It sums the total revenue generated from validated purchases made in the month of January 2017 divided by the number of users who made those purchases.

Query

SELECT SUM(revenue) / COUNT(1) as ARPPU 
FROM 
( 
    SELECT SUM(upsight_data_value_usd) as revenue, sid 
    FROM monetization 
    WHERE month = 201701 and type = ‘upsight.monetization.validated’ 
    GROUP BY sid 
)t1

ARPDAU

Intermediate Query


Overview

This query calculates the average revenue per daily active user (ARPDAU). For ARPDAU, we need to look at two tables, one with spend data and one with session data. There are many ways to merge these two datasets such as a JOIN on sid. Using UNION ALL is another method that is often more efficient as done in the example below. We GROUP BYsid after unioning the two tables to calculate the total revenue each user has spent. Then, we can count the users and sum this revenue to calculate ARPDAU.

Query

SELECT SUM(user_revenue) / COUNT(1) as ARPDAU 
FROM 
( 
  SELECT sid, SUM(revenue) as user_revenue 
  FROM 
  ( 
    SELECT upsight_data_value_usd as revenue, sid 
    FROM monetization 
    WHERE month = 201701 and type = ‘upsight.monetization.validated’ 
    UNION ALL 
    SELECT 0 as revenue, sid 
    FROM session 
    WHERE month = 201701 
  )t1 
  GROUP BY sid 
)t1

Number of users who bought each item in the store

Intermediate Query


Overview

This query counts the number of users who have purchased each individual product over a single day. If a user buys two different items, we want to count that user twice (once for each item). Doing the query with two steps using SELECT DISTINCT and count(1) instead of using COUNT DISTINCT spreads the load more evenly which results in a faster running query.

Query

SELECT upsight_data_product, COUNT(1) as users 
FROM 
( 
    SELECT DISTINCT sid, upsight_data_product 
    FROM monetization 
    WHERE day = 20170125 
) t1 
GROUP BY upsight_data_product

Number of spenders in each country this month

Intermediate Query


Overview

This query returns a row for each country and the number of users that have made a purchase in them. Much like how we can not simply sum the number of users each day to get MAU (because a user can play multiple days), here we have to consider the case where a user can spend in two different countries this month.

If we do the simple query, just replacing day by country, we will count that user in BOTH countries. This means you can not simply sum over the results in excel and get a total spender number. In the query below we count each user once by choosing place the user into whatever country they spent most recently. This is just one of many choices we could make, however the benefit is that the total spenders calculated by the query below will also match the number from the query in the previous section.

Query

SELECT location_country_ip as Country, COUNT(1) 
FROM 
( 
    SELECT 
      sid, 
      Location_country_ip, 
      row_number() OVER( PARTITION BY sid ORDER BY seq_id ) as rank 
    FROM monetization 
    WHERE month = 201701 and type = ‘upsight.monetization.validated’ 
) t1 
WHERE rank = 1 
GROUP BY location_country_ip