I am trying to recreate the Previous Page report using raw data. My calculations are coming back less than what is shown in Reports. For example, I am defining "Entered Site" as where the post_pagename is the target page, and visit_page_num = 1. This leaves me 8 fewer than expected over my 5 day period. I am capturing all of the Page Views, but less are being assigned "Entered Site" than what I see in Reports. My best guess is I am missing site entries that happen in the middle of a visit, such as someone exiting the website, then directly navigating back to the target page before the session expires.
I am looking for how site entries and previous pages are defined by the Analytics tool so I can recreate it with HiveQL. Are there other columns used to calculated Entries and previous pages? Thank you.
with renamed_pagenames as (
when (post_pagename like "oao: your info: welcome" and visit_page_num = 1) then "Entered Site"
end as my_pagename
where load_date between "2021-01-01" and "2021-01-05"
and exclude_hit like "0"
and hit_source not in (5,8,9)
and post_pagename = "oao: your info: welcome")
select my_pagename, count(my_pagename)
group by my_pagename