I am trying to recreate the Previous Page report using raw data. My calculations are coming back less than what is shown in Reports. For example, I am defining "Entered Site" as where the post_pagename is the target page, and visit_page_num = 1. This leaves me 8 fewer than expected over my 5 day period. I am capturing all of the Page Views, but less are being assigned "Entered Site" than what I see in Reports. My best guess is I am missing site entries that happen in the middle of a visit, such as someone exiting the website, then directly navigating back to the target page before the session expires.
I am looking for how site entries and previous pages are defined by the Analytics tool so I can recreate it with HiveQL. Are there other columns used to calculated Entries and previous pages? Thank you.
Example query:
with renamed_pagenames as (
select
case
when (post_pagename like "oao: your info: welcome" and visit_page_num = 1) then "Entered Site"
else post_pagename
end as my_pagename
from mktpi2.adobe_stg_un_hit_data
where load_date between "2021-01-01" and "2021-01-05"
and exclude_hit like "0"
and hit_source not in (5,8,9)
and post_pagename = "oao: your info: welcome")
select my_pagename, count(my_pagename)
from renamed_pagenames
group by my_pagename
limit 100;
Solved! Go to Solution.
Views
Replies
Total Likes
@DillanGump- I think this might be something really simple.
and hit_source not in (5,8,9)
should be
and hit_source not in (5,7,8,9)
Once I did that (and renamed the page and database) your query ran fine for me.
@DillanGump- I think this might be something really simple.
and hit_source not in (5,8,9)
should be
and hit_source not in (5,7,8,9)
Once I did that (and renamed the page and database) your query ran fine for me.
Thanks for the reply, @David_Jerome! That's a good catch, since it is usually best practice to exclude the full list. Hit Source 7 is "transactionid/customerid data source file". Changing that made no difference in my results. I am capturing all of the page views for the target page according to Reports, but my logic labeling "Entered Site" seems to be incomplete.
Views
Replies
Total Likes
Views
Replies
Total Likes
Views
Replies
Total Likes