Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
BedrockMission!

Learn more

View all

Sign in to view all badges

SOLVED

Counting Entries from Raw Data

DillanGump
Level 1
Level 1

I am trying to recreate the Previous Page report using raw data. My calculations are coming back less than what is shown in Reports. For example, I am defining "Entered Site" as where the post_pagename is the target page, and visit_page_num = 1. This leaves me 8 fewer than expected over my 5 day period. I am capturing all of the Page Views, but less are being assigned "Entered Site" than what I see in Reports. My best guess is I am missing site entries that happen in the middle of a visit, such as someone exiting the website, then directly navigating back to the target page before the session expires.

 

I am looking for how site entries and previous pages are defined by the Analytics tool so I can recreate it with HiveQL. Are there other columns used to calculated Entries and previous pages? Thank you.

 

Example query:

 

with renamed_pagenames as (
select 
  case
    when (post_pagename like "oao: your info: welcome" and visit_page_num = 1) then "Entered Site"
    else post_pagename
  end as my_pagename
from mktpi2.adobe_stg_un_hit_data
where load_date between "2021-01-01" and "2021-01-05"
    and exclude_hit like "0"
    and hit_source not in  (5,8,9)
    and post_pagename = "oao: your info: welcome")

select my_pagename, count(my_pagename)
from renamed_pagenames
group by my_pagename
limit 100;

 

 

1 Accepted Solution
David-123
Correct answer by
Level 7
Level 7

@DillanGump- I think this might be something really simple.

and hit_source not in  (5,8,9)

should be

and hit_source not in  (5,7,8,9)

Once I did that (and renamed the page and database) your query ran fine for me.

View solution in original post

4 Replies
David-123
Correct answer by
Level 7
Level 7

@DillanGump- I think this might be something really simple.

and hit_source not in  (5,8,9)

should be

and hit_source not in  (5,7,8,9)

Once I did that (and renamed the page and database) your query ran fine for me.

View solution in original post

DillanGump
Level 1
Level 1

Thanks for the reply, @David-123! That's a good catch, since it is usually best practice to exclude the full list. Hit Source 7 is "transactionid/customerid data source file". Changing that made no difference in my results. I am capturing all of the page views for the target page according to Reports, but my logic labeling "Entered Site" seems to be incomplete.

David-123
Level 7
Level 7
@DillanGump - It should not make a difference but have you tried it against workspace as well as reports? That's the only different I could think of. Other than that your query returned perfect numbers for me.
David-123
Level 7
Level 7
Curious - what sort of difference are you seeing as a percentage across 5 days?