Expand my Community achievements bar.

Data Extraction and flow to EDP

Avatar

Level 3

Hey I got an ask to extract data at GUID( similar to login ID) level from adobe analytics. What are steps/ways we can acheive this. What questions I have to ask. Should i use warehouse or feeds? Or any inbuilt API's/

 

2) they are trying to integrate with EDP. What questions should i ask stakeholders about this. i have frequency, what cdp, volumes in mind for now. 

 

 

I'm completely new to this. Any response is appreciated. Thanks!!!

3 Replies

Avatar

Community Advisor and Adobe Champion

Without knowing more about how you want to get the data, this could be a hard thing to answer... first I assume you are using Adobe Analytics and not CJA.

 

There are certainly things to consider:

 

1. Depending on how much data you need to extract, and if this is going to be an ongoing or one-time process; there may be more or less options

 

  • Workspaces, despite showing up to 400 rows of data, can export 50,000 rows (right click menu) - this can be easy to build and confirm the data that is included before getting the file.. but it's not really schedulable
  • Workspaces can be scheduled to export as CVS, but only up to the visible 400 rows
  • API can be easy if you only need one dimension, but if you need multiple breakdowns it becomes complex (you have to run an API for each breakdown at each row)
  • Data Warehouse allows you build out a flat table, but keep in mind that metrics like Visits or Unique Visitors that would get deduplicated in Workspaces don't have that luxury here....
  • Report Builder, this can be scheduled to bring data into Excel.. the interface is a little old, and I am not sure what the data limits are, but it might be a viable option for you
  • (The above all work with "clean" processed data, the exclusions are removed)
  • Raw Data Feeds - this will export everything as raw data... but it means a lot of processing to replicate the logic to get it to match what Adobe is showing (you have to identify the UVs and Visits yourself, exclude the hits that need to be excluded, and run a lot of SQL logic, etc.

 

2. Do you have the resources to massage the data

  • Raw Data is good, but its also a lot of work... this isn't something you just export and use directly... it can take a long time to get it working just right
  • What tools are you trying to integrate with, and is there a preferred connection type there? That could drive a lot of decisions... is there a dedicated team there you can work with

 

I'm sure there are more questions that will come to mind... but this is a good starting point.

Avatar

Level 3

Ya, its pretty much like a discovery session, so I don't have more context and this helps.

And your assumption is right, its adobe analytics.

 

I am good with everything but I didn't get this 'metrics like Visits or Unique Visitors that would get deduplicated'.  And how will the data be when extracted data is sent to some EDP.

I think of some questions like how frequent they want data to be, do they need real time or historical, as you mentioned whats the destination etc.

Avatar

Community Advisor and Adobe Champion

For the 'metrics like Visits or Unique Visitors that would get deduplicated' part:

 

Ok, so in Workspace, you might see a report like:

    Page Views Unique Visitors
    10 3
Pages   10 3
  Page A 4 2
  Page B 3 1
  Page C 3 2

 

So the "total" UVs is only 3, it's not 2+1+2 (5)... because the same user hit multiple pages. In the Data Warehouse, you don't get the de-duplicated total (or any totals at all)... So if you were to get this info in Workspace, and you total the data yourself, you would think that you have 5 UVs, and not 3... the more data you have, the worse the overcounting would become.... 

 

Page Views could even be impacted if you are using List dimensions, as each value in the list I believe is split into its own row... so if you have 20 items passed in a list on a single hit, it might look like 20 PVs in the Data Warehouse...  

 

 

 

Understanding the format and frequency is something for you to consider, and when you have a better sense of that, I'd be happy to discuss some more specifics of the options (and lots of other people here would too)