Expand my Community achievements bar.

Data Discrepancy: Data Warehouse lower than Workspace

Avatar

Level 2

So I know sometimes Data Warehouse can appear higher since Workspace totals de-duplicate. However, when looking both at individual rows and at site total, Workspace is showing more visits. Why would this be happening? An example is included below with made-up data. I checked that the configuration of both tables are the same.

 

PageVisits (Workspace)Visits (Data Warehouse)
Total15051480
URL #1300270
URL #245004402
6 Replies

Avatar

Community Advisor and Adobe Champion

Do you have other columns in the Data Warehouse that might be separating out the same URL to multiple rows?

 

Are there any segments that could be affecting how the data is returned?

Avatar

Level 2

the url is the only dimension & visits are the only metric. The only segment is a hit-based segment that's phrased like this "url contains XYZ or url contains XYZ". The segment was used in DW and Workspace.

Avatar

Community Advisor and Adobe Champion

In workspace are you using a VRS and is your data warehouse pull using the same VRS? At my org, we use a VRS for all our reporting in workspace, but there have been times that I've gone to do a data warehouse pull and it's defaulted to our production report suite, not the VRS. 

Other than that, @Jennifer_Dungan's comment about there being other columns that are breaking out the rows is the most likely culprit. If you have a table in workspace, your grand total row is typically going to deduplicate the individual rows. But with data warehouse, if you're summing up the individual rows it could be greater than the total. Actually, even without other columns, if you're trying to sum rows in data warehouse, it could still come up to be more than your workspace table. 

Avatar

Level 2

The report suite I used was correct & the same one was used in Workspace and DW. Additionally, there are no additional columns or rows. It's just URL & visits. The issue of de-duplication is not applicable here because DW is lower than workspace and it's occurring for the total as well as individual rows.

Avatar

Community Advisor

In Workspace sometimes you may have data attribution models on reportlet that are not present in DW. Check to see if thats the case. Also verify if you have a segment in (DW vs workspace tool) that its allowed in the other tool without restrictions or warnings.

Avatar

Level 2

Firstly, I can see that you have URL data, which most likely means web browsing and doesn't relate to app. Secondly I presume here you are comparing live Adobe data and data sent to your data warehouse. Based on the first point my hint might not be relevant in your case, but perhaps noteworthy.

 

When investigating our datafeed, we also saw discrepancies between adobe workspace data and what we saw from the same data in our DW. The cause for us was related to "offline data" in our native app. Data load to a warehouse occurs for us after the day is over. Customers however can try to access app in offline mode - in this case once they get back online (after the data load), their request is going to be forwarded to Adobe later. This means they are excluded from the original data scope that was sent to DW, but if you now at a later timepoint check Adobe workspace data, you will see the data also for such customers. 

 

Tldr; your warehouse data is limited to what was known at the loading time. Offline data appears later and can be seen in Adobe workspace live data.

 

Edit: typo.