Data Warehouse ETL
Has anyone successful built a data warehouse integration where they are able to replicate Marketo Activities to their warehouse? I am looking to replicate as often as I want but I am wondering the best way to do it. My solution I was thinking was something like:
1) Replicate the "Activity Types" (for us its like 40ish records, assume 50 total) and assign each to a bucket of 10
2) Then for each bucket call the getLeadActivities. If I did it every 5 minutes it would be:
- Every 5 Minutes
- 24 * 60 / 5 = 288 times per day
- 50 Activities in buckets of 10 = 50/10= 5
- min 2 calls per API call (1 to get token, minimum 1 to get data, depending on number of pages)
Total Api Calls = 5 * 288 * (50/10) * 2 = 14,400 per 24 hours period
To do an incremental refresh, I am thinking of the latest activityDate for each bucket as the starting point for each API bucket. This ensures that I don't miss any events in the replication.
I am hoping to understand how others are doing this? My goals are obviously as fast as possible, with as few api calls, and most efficient. I have included my python library I am using to call the API. Its not ready to be open sourced but its a start. It uses a generator to help with paging when multiple pages need to be for a single time.
Wrapper Class for Marketo API · GitHub
Thanks,
Brad