At the moment I am working on case for extracting hourly a data using Analytics API 2.0. The idea is to extract 3 dimensions with several metrics. My first dimension has around 1300 rows, second around 10 rows, third around 100 rows.
Following the documentation the ETL is making a breakdown for the over the 3 dimensions and in average I need to execute around 3000 request. When running the requests consecutively the whole process is taking more than 20 minutes. I have the possibility to run the requests in parallel ( 2,4,10, 16) but in this case I am receiving the error "too many requests" in more than 50 % of the calls.
My questions consists in the following:
- What are the max limits of parallel requests?
- Are there any limits for requests per hour or per minute and are they configurable?
For the Analytics 2.0 APIs, the throttle limit is set at 120 calls per minute, per user, regardless of report suite or company. When the throttle limit is crossed, the server returns an HTTP 429 status to the user the with message content: "too many requests". Note that the throttle applies at the API gateway layer. The throttle limits can't currently be raised.
There is also a throttle applied by the underlying reporting engine per report suite which is independent of any reporting client (e.g. Workspace, Report Builder, API, etc.) When that throttle is hit, no errors are returned but report requests take longer to process. The end result is that report requests across all clients take longer to process as the load on the report suite increases. Thus, it's possible for Workspace or Report Builder usage to cause API requests to run slower and it's possible for API requests to cause Workspace or Report Builder to run slower on a given report suite. The Analytics reporting system is a shared, multi-tenant system and this throttle is designed to prevent reporting activities on any single report suite from consuming too high a percentage of the capacity of a given data center.
With a throttle of 120 calls per user per minute it will indeed take more than 20 minutes to complete the thousands of breakdown calls you're making. Previous versions of the API made the multiple breakdown calls for on behalf of the caller; however, this has the effect of "hiding" the performance costs of large complex breakdowns to the callers. API calls in the previous version were queued and processed because of the load they placed on the reporting system. Callers had no idea how long a breakdown call would take and couldn't display any sort of progress while a large breakdown request was working its way through the queue.
API 2.0 doesn't do any queuing at the API layer but it requires callers to make the breakdown calls themselves. Callers can know exactly how many calls need to be made and thus calculate and provide an indication or update on progress when working through large multi-level breakdowns.
Note that API 2.0 is a reporting API designed to support responsive interactive reporting and exploration but is not well-suited for bulk data export use cases. The 1.4 Data Warehouse API or Data Feeds are more suitable for use cases that require bulk data export. You might want to investigate an hourly data feed for your use case.
The solution is just what the error message says: reduce the rate of requests.
In rare cases Analytics has increased the throttle limit for some customers when they have submitted a request through their Account Manager or Customer Success Manager and described their use case and explained why they need to make API calls at an increased rate. Most of the time the requests are denied because when the customer explains the use case it uncovers a misunderstanding of how the API works or what they're attempting to do is a misuse of the API and alternative solutions are preferable.
Examples of misuse/misunderstanding of the API:
Requesting too small of a granularity over too large of a date range. For example, requesting a year's worth of data broken down in 1 minute increments.
Requesting fully-processed data on intervals less than 45 minutes. Analytics data typically takes up to 45 minutes to process from the time of collection, and in peak times can take up to 2 hours. Fully processed data is generally never available in less than 30 minutes from point of collection. Requesting updated data every minute, or sometimes even every second, is terribly inefficient as the data doesn't change minute to minute. See the following whitepaper to understand the different latencies in data processing: https://marketing.adobe.com/resources/help/en_US/analytics/whitepapers/analytics-data-availability.p...
Doing large-scale bulk exports of data via the API. Scheduled Data Warehouse deliveries or other mechanisms such as Data Feeds are better suited for large-scale bulk export of data instead of the reporting API.
So, if you wish to request via your AM/CSM that the throttle limit be increased and provide a description of your use case you are welcome to but the short-term solution is to reduce the rate at which you're submitting requests - spread them out over a longer periods of time.