Expand my Community achievements bar.

Join us for the next Community Q&A Coffee Break on Tuesday April 23, 2024 with Eric Matisoff, Principal Evangelist, Analytics & Data Science, who will join us to discuss all the big news and announcements from Summit 2024!
SOLVED

Exporting Historical Adobe Analytics Raw Data to GCP BigQuery/GCP Storage Service

Avatar

Level 1

I am reaching out to inquire about the possibility of exporting all of our historical Adobe Analytics raw data to either GCP BigQuery or GCP Storage Service. We have a significant amount of data stored in Adobe Analytics, and we are exploring options to migrate it to Google Cloud Platform for further analysis and storage.

 

Our requirements are as follows:

Export: We would like to export ALL historical Adobe Analytics raw data, including both dimensions and metrics, to either GCP BigQuery or GCP Storage Service.

Data Format: We would prefer the data to be exported in a structured format such as CSV or JSON to facilitate its ingestion and analysis in GCP.

Data Integrity: It is essential for us to ensure the integrity and accuracy of the exported data during the migration process.

Transfer Frequency: We would appreciate guidance on how frequently the data can be exported from Adobe Analytics to GCP to maintain the most up-to-date dataset.

Export Method: Please advise on the recommended method or tool to use for exporting the data from Adobe Analytics to GCP. If there are any prerequisites or considerations we should be aware of, please let us know.

 

Thank you for your attention to this matter. We look forward to your prompt response and guidance on how to proceed with exporting our historical Adobe Analytics raw data to GCP.

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

As far as I am aware, Raw Data feeds only come in CSV (comma separated) or TSV (tab separated) formats. They are either zip or gzip compression, and will contain lookup files that you will need to properly process the data.

 

You likely only have 2 or 3 years worth of data in Adobe, depending on your contract level, and I am not sure how large an export you can make... I would consider making multiple back-fill pulls, starting with maybe a month and seeing how big it is... then scaling it based on those results.

 

Now, for current feeds, you have two options: Hourly or Daily

 

If you have offline data being collected from your mobile apps, there is always a chance that something might be missed, but a Daily feed would likely be better. I always suggest for "live" data, to use the maximum delay of 120 minutes (this gives your data more time to process and catch up following potential processing latency, or just to capture as much offline data as possible.

 

Keep in mind that the Raw Data feed will need to be properly processed to match what you see in Workspace... you need to take the exclude_hit into account as well as using visid_high and visid_low to uniquely identify users.

 

I assume you are aware of this, but just in case, this is a resource of the Raw Data columns to help you understand the raw data and how to process it.

https://experienceleague.adobe.com/docs/analytics/export/analytics-data-feed/data-feed-contents/data...

 

There are a lot of "little" things that come into the processing of the data that I don't have a full outline of offhand, but if you have specific questions please feel free to ask if you are having issues with the data matching.

 

Good luck.

View solution in original post

2 Replies

Avatar

Community Advisor

Adobe Analytics has the Data Feed feature, which is exactly what you want to export the raw data. https://experienceleague.adobe.com/docs/analytics/export/analytics-data-feed/data-feed-overview.html...

However, that raw data is in Adobe Analytics' own format. To "translate" that to BigQuery's format, you'll need to write your own middleman script that takes in the Data Feed raw feed, converts it to the format that you require, then send that converted data to BigQuery or other Google endpoint. You will also need a scheduler (e.g. cron job) to run this periodically.

Developing that script is outside of Adobe's scope, so you will need to develop this on your own. There might be third parties that have already developed such scripts and processes, so you could search around to see if you find one that is suitable for your needs.

Avatar

Correct answer by
Community Advisor

As far as I am aware, Raw Data feeds only come in CSV (comma separated) or TSV (tab separated) formats. They are either zip or gzip compression, and will contain lookup files that you will need to properly process the data.

 

You likely only have 2 or 3 years worth of data in Adobe, depending on your contract level, and I am not sure how large an export you can make... I would consider making multiple back-fill pulls, starting with maybe a month and seeing how big it is... then scaling it based on those results.

 

Now, for current feeds, you have two options: Hourly or Daily

 

If you have offline data being collected from your mobile apps, there is always a chance that something might be missed, but a Daily feed would likely be better. I always suggest for "live" data, to use the maximum delay of 120 minutes (this gives your data more time to process and catch up following potential processing latency, or just to capture as much offline data as possible.

 

Keep in mind that the Raw Data feed will need to be properly processed to match what you see in Workspace... you need to take the exclude_hit into account as well as using visid_high and visid_low to uniquely identify users.

 

I assume you are aware of this, but just in case, this is a resource of the Raw Data columns to help you understand the raw data and how to process it.

https://experienceleague.adobe.com/docs/analytics/export/analytics-data-feed/data-feed-contents/data...

 

There are a lot of "little" things that come into the processing of the data that I don't have a full outline of offhand, but if you have specific questions please feel free to ask if you are having issues with the data matching.

 

Good luck.