Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!

Sample SQL queries to query Adobe analytics Data feeds

Avatar

Employee Advisor

4/24/23

If you are a data analyst and your business uses adobe analytics as the web analytics tool. You will often come across a use case where you need to send raw data or hit-level data to either S3, GCP, or Azure Aws. You need to run a lot of queries to validate raw hit-level data or visualize the data or to combine this data with other data sources

Here are sample SQL queries you can bookmark to run queries for different use cases.

Note: Hits normally excluded from Adobe Analytics are included in data feeds. Use exclude_hit = 0 to remove excluded hits from queries on raw data. Data sourced data are also included in data feeds. If you want to exclude data sources, exclude all rows with hit_source = 5,7,8,9

1.  To Calculate the Bounce rate

 

 

SELECT (COUNT (( CONCAT (post_visid_high, post_visid_low, visit_num, visit_start_time_gmt)) WHERE visit_page_num = '1')/ (COUNT(( CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt)) * 100) as bounce_rate
FROM your_data_feed_name
WHERE date_time BETWEEN ('2017-11-08 00:00:00') AND ('2017-11-08 11:59:59')

 

 

2.  To count the number of visits

 

 

SELECT COUNT ( CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt)) FROM <your_data_feed_name> WHERE <conditions>

 

 

3.   To get all the Unique Visitors during a specific time period

 

 

SELECT COUNT(DISTINCT post_visid_high, post_visid_low) AS count_uv FROM
FROM <your_data_feed_name>
WHERE date_time BETWEEN ('2017-11-08 00:00:00') AND ('2017-11-08 11:59:59')
AND hit_source NOT IN (5,7,8,9);

 

 

4.  To get all the page views

 

 

SELECT COUNT(WHERE post_pagename !='' or post_page_url !='') FROM <your_data_feed_name>

 

 

5.  To get all custom links, download or exit links

 

 

SELECT COUNT(WHERE ( post_page_event = 100 or post_page_event = 101 or post_page_event = 102 )) FROM <your_date_feed_name>

 

 

6.  To get all the purchase hits

 

 

SELECT COUNT(*) 
FROM <your_data_feed_name> 
WHERE (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1') 
AND duplicate_purchases != '1'

 

 

7.  To calculate time spent

 

 

SELECT CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt) as visit_id,
       visit_page_num,
       post_cust_hit_time,
       CASE 
          WHEN LEAD(visit_id) OVER (ORDER BY CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt), visit_page_num) = visit_id AND LEAD(visit_page_num) OVER (ORDER BY CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt), visit_page_num) > visit_page_num 
          THEN LEAD(post_cust_hit_time) OVER (ORDER BY CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt), visit_page_num) - post_cust_hit_time 
          ELSE NULL 
       END as time_spent
FROM your_data_feed_name
WHERE date >= 'your_start_date'
  AND date <= 'your_end_date'

 

 

The query uses the CONCAT function to concatenate the post_visid_highpost_visid_lowvisit_num, and visit_start_time_gmt fields into a single visit_id column. It then sorts the data first by the visit_id column, and then by the visit_page_num column.

The CASE statement is used to calculate the time spent for each hit. The LEAD function is used to get the value of the subsequent hit's post_cust_hit_time field, and the time spent is calculated as the difference between the subsequent hits post_cust_hit_time and the current hits post_cust_hit_time, only if the subsequent hit belongs to the same visit and is not the last hit in the visit. The result is stored in the time_spent column.

Note: Note that this query does not apply any filters to focus on specific dimension items or events, but you can add a WHERE clause to include any necessary filters.

8.  To Calculate Revenue from data feeds

 

 

SELECT 
  SUM(CAST(REGEXP_REPLACE(post_product_list, '^.*;price=([^;]+).*$','\\1') AS FLOAT)) AS revenue 
FROM 
  <your_data_feed_name>
WHERE 
  duplicate_purchase != '1' AND 
  (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1')

 

 

This query will sum up the revenue from all rows in the data feed that meet the following conditions:

  1. Rows, where duplicate_purchase is not equal to 1, are included.
  2. Rows, where the post_event_list column contains the “purchase” event, are included.
  3. The post_product_list column is parsed to extract all price data from the s.products string which is then summed up to get the total revenue.

9.  To calculate quantity from product_list

 

 

SELECT SUM(CAST(REGEXP_SUBSTR(post_product_list, ';[^;]*;[^;]*;([^;]*)') AS FLOAT)) AS Quantity
FROM <your_data_feed_name>
WHERE duplicate_purchase != '1' AND (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1')

 

 

This query uses the same conditions as the previous Revenue query but extracts the quantity from the post_product_list column using a regular expression. The regular expression ';[^;]*;[^;]*;([^;]*)' extracts the third value between semicolons from the s.products string, which is the quantity. The CAST function is used to convert the extracted value to a float, and the SUM function calculates the total quantity for all matching rows

10.  To Get Orders

 

 

SELECT COUNT(*) 
FROM <your_data_feed_name> 
WHERE (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1') 
AND duplicate_purchases != '1'

 

 

These are some of the common use cases where we need to SQL to extract data from data feeds

Hope this helps and makes your day far easier to achieve your use cases

Let me know in the comments if you think of any other use cases where you need help with SQL queries

 

8 Comments