I tried a few different methods to calculate Single Page Visit using DataFeed and none of them matched the number in Adobe Workspace, counted visits with exactly one pagename, one page_url, one post_pagename, one post_page_url, one hit, one visit_page_num, similar first and last hit time.
I am wondering how to use datafeed to calculate the Single Page Visits. Thanks a lot.
Please try to add the exclude_hit column as well to your query. Also, this is the definition
That means you should calculate all the visits which have unique post_pagename
,Hi Vani, Thanks for your reply.
based on the query, hit sources (5,7,8,9) are removed from the data, and the exclude_hit is set to 0.
Then visits with just 1 post_pagename are selected. The "Single Page Visits" are not matched with the Workspace. I am wondering if I miss more filters
df_t.filter((col('exclude_hit') == 0)).filter(udf_remove_hitsrc(col('hit_source'))).select('post_visid_high' , 'post_visid_low' , 'visit_num' , 'post_pagename').groupby('post_visid_high' , 'post_visid_low' , 'visit_num' ).agg(F.countDistinct('post_pagename').alias('n_')).filter(col('n_') == 1).select('post_visid_high' , 'post_visid_low' , 'visit_num' ).distinct().count()
Doing this with Spark and Python
First, get the number of visits for each month by counting unique visit id.
Hi Leo, Thank you so much for your time and reply.
I found out we need to add filters on exclude_hit, and hit_source also to make accurate counts.
I am wondering have you tried creating the number of "Single Page Visits" using DataFeed?
Hi @MiladSh, the following block is indeed getting the number of single page visits, it concatenates the for columns (post_visid_high, post_visid_low, visit_num, visit_start_time_gmt) to create a single unique visit id then count how many lines of record (hit) for each visit, then filter those with only one single hit in the visit. This gives the number of single page visits. I do not using the exclude_hit as we don't have many cases requiring this filtering for a matched number against AA workspace.