Using DataMine

Interface

Overview


DataMine allows you to run custom queries which allows you to easily access and manipulate your raw data. As your write each query, everything is saved dynamically to our servers which allows you to pick up where you left off from any device. Results will be available for up to 7 days after running a query.

Note Queries are unique to each user and not shared across organizations.

Overview

Interface

Add a New Query


To add a query:

  1. Log into your dashboard and navigate to EXPLORATION > DATAMINE > ADD A NEW QUERY.
  2. Give a name to your query which is easily identifiable.
  3. Start typing your HQL query. Your progress is saved automatically.
  4. Hit RUN once you are finished with your query.

Add Query

Interface

Renaming a Query


To edit a query:

  1. Log into your dashboard and navigate to EXPLORATION > DATAMINE.
  2. Select the query you want to edit.
  3. Hover over the query and select the pencil icon to edit the name of your query.
  4. You can also make modifications to your query if you have any changes.
  5. Hit RUN AGAIN after you have finished any modifications to your query.

Edit Query

Interface

Delete a Query


To delete a query:

  1. Log into your dashboard and navigate to EXPLORATION > DATAMINE.
  2. Select the query you want to delete.
  3. Hover over the query you want to delete and select the trash icon.
  4. Select YES to confirm deletion of the query.

Delete Query

Interface

Results Panel


To download your results:

  1. Log into your dashboard and navigate to EXPLORATION > DATAMINE and select your query.
  2. Hit RUN if your query has not run.
  3. Select DOWNLOAD to download your query results as a CSV.

Note You can select the dropdown next to DOWNLOAD to select to download as either CSV or Excel. Excel exports are limited to one million rows.

Download Query

Interface

CSV Format


Data Mine uses the following CSV format for exporting data from the system.

CSV Properties

  1. Columns delimited by ,
  2. Mapkeys are separated by :
  3. Array elements are separated by ;
  4. ,, :, and ; are all escaped by \

Validation Rules

  1. Only string type columns are allowed to have data of empty string.
  2. NULL data value is represented by \N (backslash capital n). Any column type can be NULL.
  3. Extra columns in a row are ignored.
  4. Unspecified columns are treated as NULL.
  5. Column types are enforced.
  6. Arrays elements are separated by ;
  7. Mapkeys are separated by :.
  8. Delimiters are escaped by \.
  9. Quoting of fields is not supported.
  10. Leading space between field and delimiter is not allowed unless column is of type string.

Example Validations

string | int | int valid rule comment
abc,1,2 Yes. 5 Column data types are correct.
,1,2 Yes. 1 The first column is a string and is allowed to be empty.
abc,,2 No. 1 The second column is an int and is not allowed to be empty.
abc,\N,2 Yes. 2 The second column is an int and is set to NULL.
abc,1,2,3,4 Yes. 3 More columns in row than in table is ok.
abc Yes. 4 Less columns in row than in table is ok. The tailing columns will have the value of NULL.
abc,foo,3 No 5 The second column value foo is not an integer.
one\, two\, and three,4,5 Yes 8 The comma in the string column is escaped by \.
"abc","2","3" No 9 Fields can not be quoted.
abc, 2,3 No 5,10 The leading space in column two is not allowed.
array | int valid rule comment
abc;def;hij,1 Yes. 6
abc;;hij,2 Yes. 6,1 The array elements are checked for validity. Since the array is a string an empty array element is valid.
one\;two;three\;four,3 Yes. 6,8 The semicolon is escaped by \.
array | string valid rule comment
1;2;3,abc Yes. 6
1;hi;3,abc No. 6,5 Array elements are type enforced as well.
1;;3,abc No. 6,1 The second element in the array can not be empty because the array is of type int.
1;\N;3,abc Yes. 6,2 The second element in the array is represented as NULL.

Data Schema

Overview


The DataMine UI allows for the easy creation of queries. By clicking on the table icon on the right hand side of the interface a list of available data tables will be shown. By clicking on triangle icon next to a table, the columns for each table will be revealed.

Table and Apps

There are four main types of column types which are:

  1. String - String

  2. Int - Int

  3. Boolean - Boolean

  4. Map - Map

Within DataMine, each table contains all the data for all the apps. To filter your data by app you can use the column app_id and the = function. For example:

SELECT * FROM campaign WHERE app_id = 200660 AND day = 20150730

Additionally you can use the IN function to select data from multiple apps at once.

All tables have standard attributes that the SDK collects automatically which includes:

  • User Unique SID
  • Opt Out
  • Location
  • App Version
  • App Bundle ID
  • Device OS
  • Device OS Version
  • Device Type (Phone, Tablet)
  • Device Hardware
  • Limited Ad Tracking (iOS only)
  • Device Manufacture
  • Device Carrier
  • Device Connection
  • Device Jailbroken
  • Screen Width
  • Screen Height
  • Screen Scale
  • Screen DPI
  • IDFV
  • IDFA
  • Android ID
  • Google Advertising ID
  • User’s Time Zone
  • Long and Lat (if Location Tracking is enabled)

Once you have finished your query you can easily download your data either as a CSV or Excel compatible version.

Note Excel compatible exports are limited to one million rows.

Data Schema

Tables


Depending on the message type, data gets stored to one of the following tables.

Standard TablesDescriptionAvailability

comm

Logs push related messages

Table created when push is implemented

campaign

Logs marketing campaign messages

Table created when campaigns are sent

device

Logs user device attributes and is automatically updated once every ~4 hours

Table created when a user first launches the app with the Upsight SDK

event

Logs event messages

Table created when application starts sending custom events

install

Logs user installs

Table created when a user first launches the app with the Upsight SDK

milestone

Logs user milestones

Table created when milestones are reached. Set up milestones under Application - Milestones

monetization

Logs IAP revenue

Table created when a purchase is made. Requires the Upsight monetization call to be implemented

session

Logs when a user began and ended a session

Table created when a user first launches the app with the Upsight SDK

If your application is defined as a Web application on the Dashboard, data will appear in separate tables prepended with web_, such as web_event.

Data Schema

Message Attributes


Each message comes with default attributes tracked by the Upsight SDK.

NameDescription

type

The name of the event. SDK generated events are prepended with “upsight” (e.g., upsight.session.start). Custom events are prepended with “pub” (e.g., pub.economy.spend.coin)

app_id

The 5-digit application ID assigned by Upsight. It is recommended to filter by app_id when querying.

sid

The identifier of the device which sent the message (assigned by Upsight upon installs, including reinstalls)

ts

The timestamp the message was sent in Unixtime, UTC

hour / day / month

The time the message was sent in YYYYMMddHH date format. These are the partitions which restrict how much data is read. It is recommended to apply a time filter in your innermost query. For example, WHERE day = 20170811

device_os / app_version / location_country_ip / etc.

Standard device data

user_attributes

Custom field containing user information when the message was sent

pub_data

Custom field containing information about the event

For a full list of attributes, see Table Column Reference.

Data Schema

Custom Events and Custom Attributes


An important feature of Datamine is pub_data and user_attributes. All tables have columns for pub_data and user_attributes. These columns have data saved in key value pairs. For example if you had a music app and you sent the song name in a key value pair with the key called song in a custom event called voted, you would use the following query to get the distinct song names that was voted for on a particular day.

SELECT DISTINCT pub_data[‘Song’] AS Song FROM event WHERE type = "pub.voted" AND day = 20150730

Note Values for pub_data and user_attributes are always saved as strings to they will need to be converted to ints or doubles before any mathematical operators can be applied to them.

Using Time Partitions

Using DataMine


Hive tables are organized into partitions that make querying a portion of the data more efficient. All Upsight tables are set up with month, day, and hour partitions. This means that when writing a query, specifying a date range in the WHERE clause of the query will greatly improve query speed. The amount of data that you will process without using a WHERE month = YYYYMM or day = YYYYMMDD will almost certainly be so large as to slow down your query.

Time Formats

  • To compare months: yyyyMM
  • To compare days: yyyyMMdd
  • To compare hours: yyyyMMddhh

Example Queries using Partitions

--selects all events on Nov 1st 
SELECT * FROM event WHERE day = 20171101 

--selects all events after and including Nov 1st 
SELECT * FROM event WHERE day >= 20171101 

--selects all events occurring June through Aug
SELECT * FROM event WHERE month BETWEEN 201706 and 201708 

--selects all events occurring between Nov 1st 6AM - 5PM UTC
SELECT * FROM event WHERE hour BETWEEN 2017110106 AND 2017110117 

--selects all events occurring on Nov 1st in hours 6,8, and 10 
SELECT * FROM event WHERE hour IN (2017110106, 20171108, 2017110110) 

It is recommended to always filter to the smallest time range needed. Only if you have a use case to look at data for all time (e.g. finding a user’s lifetime spend) should you remove the time restriction.

Table Column Reference

Using DataMine


NameData TypeDescriptionExample

app_bundleid

string

The app store bundle id as sent by your application

com.orgname.gamename.develop

app_id

int

A 5-digit unique ID assigned to each app registered

83732

app_token

string

The unique token that is used to send data to Upsight

420593be54335588b96c8f3d09ee59f0

app_version

string

The app version a user is on

3.0.0

day

int

Date in YYYYMMDD format

20171020

device_carrier

string

Mobile carrier, if applicable

Verizon

device_connection

string

Whether the user is connected via a mobile plan or wifi

Wi-Fi

device_hardware

string

Device hardware

iPhone7,2

device_jailbroken

boolean

Has the device been jailbroken

FALSE

device_limit_ad_tracking

boolean

Whether user has opted out of ad tracking / ad personalization in device settings

FALSE

device_manufacturer

string

Device manufacturer

Apple

device_os

string

Device operating system - Android, iOS or Web

ios

device_os_version

string

Device operating system version

10.1.1

device_type

string

Phone, tablet or web browser

phone

hour

int

Hour in YYYMMDDHH format

2017102006

identifiers

string

Message identifier

pub

ids_aid

string

Google advertising ID

635cf4bd-6622-5e54-9677-2cf675a36ac4

ids_android_id

string

Android ID

766bc7f93846c9b4

ids_idfa

string

iOS advertiser ID

6E3545CDE-2E15-506C-930D-41EBD8E6C412

ids_idfv

string

iOS vendor ID

766d2f2c-1dbe-56f5-4ef8-ed86014c088

install_ts

int

User install timestamp in UTC time, Unixtime format

1498808782

locale

string

Language

 en_US

location_country_geo

string

User location based on GPS

USA

location_country_ip

string

User location based on IP address

USA

location_lat

string

Location latitude

+30.40000

location_lon

string

Location longitude

-97.75280

location_tz

string

Location timezone

-600

month

int

Month in YYYYMM format

201710

msg_campaign_id

int

ID of campaign the event belongs to

10804253

msg_id

int

ID of push message sent

10804273

opt_out

boolean

Whether the user is opt out of certain data being tracked, usually used to comply with privacy laws for young users

FALSE

partner_data

map<string,string>

Data related to connected partner SDKs, such as data about the Mediation Module

{'some_key': 'some value'}

partners

string

Data related to connected partner SDKs, such as data about the Mediation Module

[{"reflected_properties":["sdk.plugin"],"partner_name":"fuse","initialized":false,"version":"2.6.6"}]

past_session_time

int

Cumulative session time for completed sessions

202

pub_data

map<string,string>

Customer Defined event attributes stored in a mapped key-value pair format

{"amount": 150, "source": "item_shop"}

received_ts

int

Timestamp of when the event message was received by Upsight

1508179647

request_ts

int

Timestamp when the request was sent by the application, this can differ from event ts since some Upsight messages are sent in batches

1508179647

screen_dpi

int

Screen resolution dots per inch

480

screen_height

int

Screen height in pixels

667

screen_scale

double

Screen scale

2

screen_width

int

Screen width in pixels

375

sdk_build

string

SDK build

+release.0671fd0

sdk_plugin

string

SDK plugin and version

unity:4.3.2

sdk_version

string

SDK version

4.3.2

seq_id

int

Nth message sent by a user, seq_id = 12 would be the 12th msg a user has sent

12

session_num

int

The user's Nth session in which the message was sent. Events share the same session_num if a user triggered them within the same session

5

session_start

int

Session start time in Unixtime, UTC

1498815596

sid

string

Unique ID assigned every time the app is installed on a device. If users re-install, they are assigned a new sid.

004799gd888642c28adc1d99f0b04874

ts

int

Timestamp of the message received in Unixtime, UTC

1498815660

type

string

Name of the event

pub.economy.spend.coin

upsight_data

map<string,string>

Field containing additional information relevant to the event

{"bundle.schema_hash":"5409204c14d00d5b0b404a0a54488793", "bundle.hash":"b02ecc3642591d1594307f7a10895ee2107a0cb9"}