Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!

Single Page Visit and Bounces Calculation

Avatar

Level 1

I tried a few different methods to calculate Single Page Visit using DataFeed and none of them matched the number in Adobe Workspace, counted visits with exactly one pagename, one page_url, one post_pagename, one post_page_url, one hit, one visit_page_num, similar first and last hit time.  

I am wondering how to use datafeed to calculate the Single Page Visits. Thanks a lot.

5 Replies

Avatar

Employee Advisor

Please try to add the exclude_hit column as well to your query. Also, this is the definition 

The ‘Single page visits’ dimension reports the number of visits that consisted of a single unique Page dimension item. It is the dimension form of the Single page visits metric.

That means you should calculate all the visits which have unique post_pagename

Avatar

Level 1

,Hi Vani, Thanks for your reply. 

based on the query, hit sources (5,7,8,9) are removed from the data, and the exclude_hit is set to 0. 

Then visits with just 1 post_pagename are selected. The "Single Page Visits" are not matched with the Workspace. I am wondering if I miss more filters

 

Query:

df_t.filter((col('exclude_hit') == 0)).filter(udf_remove_hitsrc(col('hit_source'))).select('post_visid_high' , 'post_visid_low' , 'visit_num' , 'post_pagename').groupby('post_visid_high' , 'post_visid_low' , 'visit_num' ).agg(F.countDistinct('post_pagename').alias('n_')).filter(col('n_') == 1).select('post_visid_high' , 'post_visid_low' , 'visit_num' ).distinct().count()

Avatar

Community Advisor

Doing this with Spark and Python

First, get the number of visits for each month by counting unique visit id.

month_visits = (
hit_data
.select(
F.col('date_time').substr(1,7).alias('month'),
F.concat_ws("_", "post_visid_high", "post_visid_low", "visit_num", "visit_start_time_gmt").alias('vid')
)
.groupBy('month')
.agg(F.countDistinct('vid').alias('visits'))
.orderBy('month')
)
Then get the number of bounces per month by counting how many visits have only one hit.
month_bounce = (
hit_data
.select(
F.col('date_time').substr(1,7).alias('month'),
F.concat_ws("_", "post_visid_high", "post_visid_low", "visit_num", "visit_start_time_gmt").alias('vid')
)
.groupBy('month','vid').count()
.filter(F.col('count')==1)
.groupBy('month').agg(F.count('count').alias('bounce'))
.orderBy('month')
)
Finally, simply joining the two data frames and calculating the bounce rate by bounces/visits
(
month_visits
.join(month_bounce, on='month')
.withColumn('bounce_rate', F.col('bounce') / F.col('visits'))
.orderBy('month')
.show()
)

Avatar

Level 1

Hi Leo, Thank you so much for your time and reply.

I found out we need to add filters on exclude_hit, and hit_source also to make accurate counts. 

I am wondering have you tried creating the number of "Single Page Visits" using DataFeed? 

Avatar

Community Advisor

Hi @MiladSh, the following block is indeed getting the number of single page visits, it concatenates the for columns (post_visid_high, post_visid_low, visit_num, visit_start_time_gmt) to create a single unique visit id then count how many lines of record (hit) for each visit, then filter those with only one single hit in the visit. This gives the number of single page visits. I do not using the exclude_hit as we don't have many cases requiring this filtering for a matched number against AA workspace.

 

month_bounce = (
hit_data
.select(
F.col('date_time').substr(1,7).alias('month'),
F.concat_ws("_", "post_visid_high", "post_visid_low", "visit_num", "visit_start_time_gmt").alias('vid')
)
.groupBy('month','vid').count()
.filter(F.col('count')==1)
.groupBy('month').agg(F.count('count').alias('bounce'))
.orderBy('month')
)