Sample SQL queries to query Adobe analytics Data feeds | Community
Skip to main content
VaniBhemarasetty
Adobe Employee
Adobe Employee
April 24, 2023

Sample SQL queries to query Adobe analytics Data feeds

  • April 24, 2023
  • 11 replies
  • 13239 views

If you are a data analyst and your business uses adobe analytics as the web analytics tool. You will often come across a use case where you need to send raw data or hit-level data to either S3, GCP, or Azure Aws. You need to run a lot of queries to validate raw hit-level data or visualize the data or to combine this data with other data sources

Here are sample SQL queries you can bookmark to run queries for different use cases.

Note: Hits normally excluded from Adobe Analytics are included in data feeds. Use exclude_hit = 0 to remove excluded hits from queries on raw data. Data sourced data are also included in data feeds. If you want to exclude data sources, exclude all rows with hit_source = 5,7,8,9

1.  To Calculate the Bounce rate

 

 

SELECT (COUNT (( CONCAT (post_visid_high, post_visid_low, visit_num, visit_start_time_gmt)) WHERE visit_page_num = '1')/ (COUNT(( CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt)) * 100) as bounce_rate FROM your_data_feed_name WHERE date_time BETWEEN ('2017-11-08 00:00:00') AND ('2017-11-08 11:59:59')

 

 

2.  To count the number of visits

 

 

SELECT COUNT ( CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt)) FROM <your_data_feed_name> WHERE <conditions>

 

 

3.   To get all the Unique Visitors during a specific time period

 

 

SELECT COUNT(DISTINCT post_visid_high, post_visid_low) AS count_uv FROM FROM <your_data_feed_name> WHERE date_time BETWEEN ('2017-11-08 00:00:00') AND ('2017-11-08 11:59:59') AND hit_source NOT IN (5,7,8,9);

 

 

4.  To get all the page views

 

 

SELECT COUNT(WHERE post_pagename !='' or post_page_url !='') FROM <your_data_feed_name>

 

 

5.  To get all custom links, download or exit links

 

 

SELECT COUNT(WHERE ( post_page_event = 100 or post_page_event = 101 or post_page_event = 102 )) FROM <your_date_feed_name>

 

 

6.  To get all the purchase hits

 

 

SELECT COUNT(*) FROM <your_data_feed_name> WHERE (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1') AND duplicate_purchases != '1'

 

 

7.  To calculate time spent

 

 

SELECT CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt) as visit_id, visit_page_num, post_cust_hit_time, CASE WHEN LEAD(visit_id) OVER (ORDER BY CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt), visit_page_num) = visit_id AND LEAD(visit_page_num) OVER (ORDER BY CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt), visit_page_num) > visit_page_num THEN LEAD(post_cust_hit_time) OVER (ORDER BY CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt), visit_page_num) - post_cust_hit_time ELSE NULL END as time_spent FROM your_data_feed_name WHERE date >= 'your_start_date' AND date <= 'your_end_date'

 

 

The query uses the CONCAT function to concatenate the post_visid_highpost_visid_lowvisit_num, and visit_start_time_gmt fields into a single visit_id column. It then sorts the data first by the visit_id column, and then by the visit_page_num column.

The CASE statement is used to calculate the time spent for each hit. The LEAD function is used to get the value of the subsequent hit's post_cust_hit_time field, and the time spent is calculated as the difference between the subsequent hits post_cust_hit_time and the current hits post_cust_hit_time, only if the subsequent hit belongs to the same visit and is not the last hit in the visit. The result is stored in the time_spent column.

Note: Note that this query does not apply any filters to focus on specific dimension items or events, but you can add a WHERE clause to include any necessary filters.

8.  To Calculate Revenue from data feeds

 

 

SELECT SUM(CAST(REGEXP_REPLACE(post_product_list, '^.*;price=([^;]+).*$','\\1') AS FLOAT)) AS revenue FROM <your_data_feed_name> WHERE duplicate_purchase != '1' AND (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1')

 

 

This query will sum up the revenue from all rows in the data feed that meet the following conditions:

  1. Rows, where duplicate_purchase is not equal to 1, are included.
  2. Rows, where the post_event_list column contains the “purchase” event, are included.
  3. The post_product_list column is parsed to extract all price data from the s.products string which is then summed up to get the total revenue.

9.  To calculate quantity from product_list

 

 

SELECT SUM(CAST(REGEXP_SUBSTR(post_product_list, ';[^;]*;[^;]*;([^;]*)') AS FLOAT)) AS Quantity FROM <your_data_feed_name> WHERE duplicate_purchase != '1' AND (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1')

 

 

This query uses the same conditions as the previous Revenue query but extracts the quantity from the post_product_list column using a regular expression. The regular expression ';[^;]*;[^;]*;([^;]*)' extracts the third value between semicolons from the s.products string, which is the quantity. The CAST function is used to convert the extracted value to a float, and the SUM function calculates the total quantity for all matching rows

10.  To Get Orders

 

 

SELECT COUNT(*) FROM <your_data_feed_name> WHERE (post_event_list LIKE '%,1,%' OR post_event_list LIKE '1,%' OR post_event_list LIKE '%,1') AND duplicate_purchases != '1'

 

 

These are some of the common use cases where we need to SQL to extract data from data feeds

Hope this helps and makes your day far easier to achieve your use cases

Let me know in the comments if you think of any other use cases where you need help with SQL queries

 

11 replies

Krishna_Musku
Community Advisor
Community Advisor
April 24, 2023

Thanks! These are very helpful for a good start on data feed queries.

July 13, 2023

Would you please provide an SQL to calculate "Average Time Spent on Site"

Level 3
July 13, 2023

@sandeep11 7th query is to calculate time spent

DineshRkumar
Level 3
July 21, 2023

We don't have post_visid_high, post_visid_low, visit_num  to calculate number of visits.  Is there any alternative options?

September 15, 2023

These sample queries are amazing. We need more of this in Experience League. Massively speeds up time to value!

RobertBlakeley
Community Advisor
Community Advisor
October 6, 2023

Very useful for our new guy.

Level 2
October 27, 2023

Shouldn't the Visit Calculation be:

SELECT COUNT (DISTINCT( CONCAT(post_visid_high, post_visid_low, visit_num, visit_start_time_gmt))) FROM <your_data_feed_name> WHERE <conditions>

 

Otherwise aren't you counting every record not every Visit?

Community Advisor
December 22, 2023

Thank So much @vanibhemarasetty . This is really helpful.

skatofiabah
Level 5
September 20, 2024

@vanibhemarasetty - It looks like these need to be updated. For example, @johnma9 is right on the visits count.

DineshRkumar
Level 3
October 1, 2024

Team, 
I am trying to match data feed raw data, prop1 occurrences with Adobe workspace occurrences, however no success. Can you please help me here.