If you are a data analyst and your business uses adobe analytics as the web analytics tool. You will often come across a use case where you need to send raw data or hit-level data to either S3, GCP, or Azure Aws. You need to run a lot of queries to validate raw hit-level data or visualize the data or to combine this data with other data sources
Here are sample SQL queries you can bookmark to run queries for different use cases.
Note: Hits normally excluded from Adobe Analytics are included in data feeds. Use exclude_hit = 0 to remove excluded hits from queries on raw data. Data sourced data are also included in data feeds. If you want to exclude data sources, exclude all rows with hit_source = 5,7,8,9
1. To Calculate the Bounce rate
SELECT (COUNT (( CONCAT (post_visid_high, post_visid_low, visit_num, visit_start_time_gmt)) WHERE visit_page_num = '1')/ (COUNT(( CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt)) * 100) as bounce_rate
FROM your_data_feed_name
WHERE date_time BETWEEN ('2017-11-08 00:00:00') AND ('2017-11-08 11:59:59')
2. To count the number of visits
SELECT COUNT ( CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt)) FROM <your_data_feed_name> WHERE <conditions>
3. To get all the Unique Visitors during a specific time period
SELECT COUNT(DISTINCT post_visid_high, post_visid_low) AS count_uv FROM
FROM <your_data_feed_name>
WHERE date_time BETWEEN ('2017-11-08 00:00:00') AND ('2017-11-08 11:59:59')
AND hit_source NOT IN (5,7,8,9);
4. To get all the page views
SELECT COUNT(WHERE post_pagename !='' or post_page_url !='') FROM <your_data_feed_name>
5. To get all custom links, download or exit links
SELECT COUNT(WHERE ( post_page_event = 100 or post_page_event = 101 or post_page_event = 102 )) FROM <your_date_feed_name>
6. To get all the purchase hits
SELECT COUNT(*)
FROM <your_data_feed_name>
WHERE (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1')
AND duplicate_purchases != '1'
7. To calculate time spent
SELECT CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt) as visit_id,
visit_page_num,
post_cust_hit_time,
CASE
WHEN LEAD(visit_id) OVER (ORDER BY CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt), visit_page_num) = visit_id AND LEAD(visit_page_num) OVER (ORDER BY CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt), visit_page_num) > visit_page_num
THEN LEAD(post_cust_hit_time) OVER (ORDER BY CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt), visit_page_num) - post_cust_hit_time
ELSE NULL
END as time_spent
FROM your_data_feed_name
WHERE date >= 'your_start_date'
AND date <= 'your_end_date'
The query uses the CONCAT function to concatenate the post_visid_high, post_visid_low, visit_num, and visit_start_time_gmt fields into a single visit_id column. It then sorts the data first by the visit_id column, and then by the visit_page_num column.
The CASE statement is used to calculate the time spent for each hit. The LEAD function is used to get the value of the subsequent hit's post_cust_hit_time field, and the time spent is calculated as the difference between the subsequent hits post_cust_hit_time and the current hits post_cust_hit_time, only if the subsequent hit belongs to the same visit and is not the last hit in the visit. The result is stored in the time_spent column.
Note: Note that this query does not apply any filters to focus on specific dimension items or events, but you can add a WHERE clause to include any necessary filters.
8. To Calculate Revenue from data feeds
SELECT
SUM(CAST(REGEXP_REPLACE(post_product_list, '^.*;price=([^;]+).*$','\\1') AS FLOAT)) AS revenue
FROM
<your_data_feed_name>
WHERE
duplicate_purchase != '1' AND
(post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1')
This query will sum up the revenue from all rows in the data feed that meet the following conditions:
- Rows, where duplicate_purchase is not equal to 1, are included.
- Rows, where the post_event_list column contains the “purchase” event, are included.
- The post_product_list column is parsed to extract all price data from the s.products string which is then summed up to get the total revenue.
9. To calculate quantity from product_list
SELECT SUM(CAST(REGEXP_SUBSTR(post_product_list, ';[^;]*;[^;]*;([^;]*)') AS FLOAT)) AS Quantity
FROM <your_data_feed_name>
WHERE duplicate_purchase != '1' AND (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1')
This query uses the same conditions as the previous Revenue query but extracts the quantity from the post_product_list column using a regular expression. The regular expression ';[^;]*;[^;]*;([^;]*)' extracts the third value between semicolons from the s.products string, which is the quantity. The CAST function is used to convert the extracted value to a float, and the SUM function calculates the total quantity for all matching rows
10. To Get Orders
SELECT COUNT(*)
FROM <your_data_feed_name>
WHERE (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1')
AND duplicate_purchases != '1'
These are some of the common use cases where we need to SQL to extract data from data feeds
Hope this helps and makes your day far easier to achieve your use cases
Let me know in the comments if you think of any other use cases where you need help with SQL queries