Expand my Community achievements bar.

Join us January 15th for an AMA with Champion Achaia Walton, who will be talking about her article on Event-Based Reporting and Measuring Content Groups!

Trying to tie out Adobe Workspace to Downstream Databricks

Avatar

Level 5

Hi,

 

I'm trying to tie out UVs, Visits, and Page Views from Adobe Workspace to Downstream Databricks from the adobe daily web traffic table. Here is my code. I don't know why it's not lining up and I cannot seem to find a good answer. I used the documentation to have certain filters. My page views by month start to fly off, along with visits. UVs seem to be fine though. Can anybody help with this? If not, can someone point me to a Downstream analytics consultant?

 

skatofiabah_1-1735571661586.png

 

 

Thanks!

15 Replies

Avatar

Community Advisor and Adobe Champion

The first thing that comes to mind for me is whether or not you're using a VRS in workspace? If so, any segments that are applied to the VRS would need to be recreated in code to make the data match. Same for any bot filtering or IP exclusions that are being done to the data. 

For the page views being too high, do you have an event that fires on page views. If so, have you tried counting instances of that event? 

Avatar

Level 5

Hi @MandyGeorge,

 

Yes, so I think my SQL commented out code if I included that would be the mimicking the VRS I think. I commented it out to just compare to the regular report suite and I'm still getting these discrepancies. Also, what bots or IP filters would I need to exclude? My numbers downstream fall short in the more recent months, so filtering would only reduce it. Also, shouldn't Adobe's Out-of-the-Box Page Views metric upstream and downstream still align anyway? If I do do a page event instance, how do I get regex or custom code to only filter that out of event_list? I did find a report suite filter in my code that I added here (username), but I still cannot get a match.

 

skatofiabah_0-1735573496352.png

I'm close in UVs and visits overall for almost all 2024. I'm only 12 UVs less downstream and about 202 Visits short downstream. However, I am 50,000 Page Views Short in Downstream vs. Workspace. I might be missing more filters potentially? Any help or thoughts?

 

Thanks!

 

Avatar

Community Advisor and Adobe Champion

There's an option for "exclude by IP address" in the report suite settings, I'm not sure if those automatically apply to the data feeds or not. 

Have you tried breaking it out by week or by month to see if there is a particular time period that is off more than the others? It's possible that some of the hits could be missing or didn't come into your data lake properly. 

How big of a variance is 50k page views for you? Depending on the size of your report suite it could be a big or small amount. For our data feeds, we have a 0.6% variance between them and workspace. You're never going to get 100% accuracy, but if you're within 1%-2%, that's generally pretty good. More than that I would look at what is happening. 

 

As for getting the data out of the event list. You will need to know what each value in the event list means, there should be a lookup file called "event" that has a mapping. It will have data like this:

MandyGeorge_1-1735574499368.png

 

This is a sample of the code that we use for pulling information from the event_list, and from the product_list. We have a CTE that unnests the two of them, and then the events are easy to pull from that. Also, the reason I use "min(date)" is because a visit can span more than one day, so I always take the day that the visit starts on.

MandyGeorge_0-1735574413525.png

 

Avatar

Level 5

Hi @MandyGeorge,

 

Thanks for this! We saw our UVs and Visits Variance go from average 0.5% from Jan to August, to higher than 2-3% variance Sept and onward. I added this in for our VRS. The VRS just has Prop24 not equal to and eVar75 equal to filters. Any thoughts? Why it would have such a drastic change when our VRS segments and the filter itself didn't change? Our variance is essentially good for non-VRS. But with my example, if you or anybody else could help, that would be great!

skatofiabah_0-1735586225161.png

 

Thanks!

 

Avatar

Community Advisor and Adobe Champion

What level is the segment on your VRS at? Is it hit, visit, or visitor? If it's visit or visitor then you need to make your conditions in the code apply to the entire visit, visitor. So you would have to make a CTE or another table with the visit/visitor IDs and whether they should be included or excluded.

Are you able to share a screenshot of the segment on your VRS? That might give me some idea of what the difference is between that and your code.

Avatar

Level 5

Hi @MandyGeorge,

Of course. Here is the VRS along with the segments. Both segments are hit level.

skatofiabah_0-1735588665747.png

 

skatofiabah_1-1735588729225.png

 

skatofiabah_2-1735588795969.png

 

 

 

Avatar

Community Advisor and Adobe Champion

Try making a CTE or subquery with post_visid_high, post_visid_low, visit_num, visit_start_time_gmt, etc. Bring in all of those columns, don’t aggregate anything yet. Put the post_prop24 != “value” and post_evar75 = “value” (replace with your actual values) in that CTE/subquery. Then try the aggregations.

 

Since you said that your page views in the data feeds are less than in workspace. I'm wondering if one hit in a visit gets excluded, if it's excluding the entire visit because of your aggregations. 

Avatar

Level 5

Hi @MandyGeorge,

 

I do have those filters for the Post prop and evar in my query above. I also include that concatenated visits and UV string that this recommends downstream: https://experienceleague.adobe.com/en/docs/analytics/export/analytics-data-feed/data-feed-contents/d....

Avatar

Community Advisor and Adobe Champion

@skatofiabah wrote:

Hi @MandyGeorge,

 

I do have those filters for the Post prop and evar in my query above. I also include that concatenated visits and UV string that this recommends downstream: https://experienceleague.adobe.com/en/docs/analytics/export/analytics-data-feed/data-feed-contents/d....


Right, you do have those filters in there. But the way the code is written, I'm wondering if the aggregation is doing something weird. I would do the inclusion/exclusion of the prop/evar in a CTE or subquery without any aggregation. Just straight up remove the rows that don't meet the criteria. Then in the final table do the aggregation, and see if that makes a difference.

Avatar

Level 5

Hi @MandyGeorge,

 

I see. So essentially you are saying to filter on those props and evars without aggregating (dates, sums, etc.)? I'm assuming that the query would take a long time or break on the non-aggregated piece since bringing in every hit value/row of data. Then you are saying to aggregate after all the filters have gone through?

Avatar

Community Advisor and Adobe Champion

Yes, because you're not grouping by the prop/evar value, just by month, if one hit in a visit meets the criteria, it might be excluding all of the hits. I'm trying to think of reasons why you're seeing such a difference and the aggregation is the only thing that I can think of since we ruled everything else out.

 

I suppose a different approach would be to add the prop/evar to the group by and see if that makes a difference. Or do case statements within the calculations to only count visits/pages when it meets the criteria. 

 

Doing it as a subquery would look something like this 

MandyGeorge_0-1735596251865.png

 

Avatar

Community Advisor and Adobe Champion

Just to add a little context... Page Views in Adobe are a little unique... any hit with either a Page Name or a Page URL (the standard dimensions) are considered a "page view"... there is no actual "page view event" as such in the Page Events list...

 

So in order to get your Page Views, you actually have to look for a value to exist in one or both of those fields.

Avatar

Community Advisor and Adobe Champion

@MandyGeorge might be right then, it might have something to do with the "replication" of your VRS? Maybe all the rules aren't quite right and it causing issues.

 

What happens if you compare data from the full suite to your Databricks data? Does it match better in that scenario?

Avatar

Level 5

Hi @Jennifer_Dungan,

 

My variance is around 0.5% and I mentioned that earlier in the thread when I do the report suite (without VRS). It's just odd that just adding 2 post_ values throws it all off and that they surge in the fall and onward.