Expand my Community achievements bar.

Join us at Adobe Summit 2024 for the Coffee Break Q&A Live series, a unique opportunity to network with and learn from expert users, the Adobe product team, and Adobe partners in a small group, 30 minute AMA conversations.
SOLVED

Counting Entries from Raw Data

Avatar

Level 2

I am trying to recreate the Previous Page report using raw data. My calculations are coming back less than what is shown in Reports. For example, I am defining "Entered Site" as where the post_pagename is the target page, and visit_page_num = 1. This leaves me 8 fewer than expected over my 5 day period. I am capturing all of the Page Views, but less are being assigned "Entered Site" than what I see in Reports. My best guess is I am missing site entries that happen in the middle of a visit, such as someone exiting the website, then directly navigating back to the target page before the session expires.

 

I am looking for how site entries and previous pages are defined by the Analytics tool so I can recreate it with HiveQL. Are there other columns used to calculated Entries and previous pages? Thank you.

 

Example query:

 

with renamed_pagenames as (
select 
  case
    when (post_pagename like "oao: your info: welcome" and visit_page_num = 1) then "Entered Site"
    else post_pagename
  end as my_pagename
from mktpi2.adobe_stg_un_hit_data
where load_date between "2021-01-01" and "2021-01-05"
    and exclude_hit like "0"
    and hit_source not in  (5,8,9)
    and post_pagename = "oao: your info: welcome")

select my_pagename, count(my_pagename)
from renamed_pagenames
group by my_pagename
limit 100;

 

 

1 Accepted Solution

Avatar

Correct answer by
Level 8

@DillanGump- I think this might be something really simple.

and hit_source not in  (5,8,9)

should be

and hit_source not in  (5,7,8,9)

Once I did that (and renamed the page and database) your query ran fine for me.

View solution in original post

4 Replies

Avatar

Correct answer by
Level 8

@DillanGump- I think this might be something really simple.

and hit_source not in  (5,8,9)

should be

and hit_source not in  (5,7,8,9)

Once I did that (and renamed the page and database) your query ran fine for me.

Avatar

Level 2

Thanks for the reply, @David_Jerome! That's a good catch, since it is usually best practice to exclude the full list. Hit Source 7 is "transactionid/customerid data source file". Changing that made no difference in my results. I am capturing all of the page views for the target page according to Reports, but my logic labeling "Entered Site" seems to be incomplete.

Avatar

Level 8
@DillanGump - It should not make a difference but have you tried it against workspace as well as reports? That's the only different I could think of. Other than that your query returned perfect numbers for me.

Avatar

Level 8
Curious - what sort of difference are you seeing as a percentage across 5 days?