Expand my Community achievements bar.

Documentation || Query Service Performance Metrics

Avatar

Level 3

Hello Team,

 

As outlined in the documentation link here.  https://experienceleague.adobe.com/en/docs/experience-platform/query/guardrails 

 

Guardrail type             Description

Performance guardrail (Soft limit)Performance guardrails are usage limits that relate to the scoping of your use cases. When exceeding performance guardrails, you may experience performance degradation and latency. Adobe is not responsible for such performance degradation. Customers who consistently exceed a performance guardrail may elect to license additional capacity to avoid performance degradation.

 

Batch Queries

Query concurrencySupportedN/AScheduled batch queries are asynchronous jobs, therefore concurrent queries are supported.

 

 

Can somebody provide more insights on query performance behavior stats? What are the threshold defined by Adobe?. Here is the context and would like to know about it.

 

Let's say, business wants journey step events data  inserted into the existing derived datasets (Max of 10-15 attributes in each record) for further engagements in future. Adobe's recommended way is to write data distiller queries and schedule the query if you need the data regularly.

 

1. 'N' number of concurrent batch queries are supported as mentioned in the document above. Lets assume each record has 10-15 attributes and this is the only custom batch query running in the platform, If query has to handle 4 million records each day, what's the execution time normally taken by query? Does adobe has a documentation to understand the natively built query service performance?

 

2. As mentioned in the document Maximum execution time is 24 hours? Also Does query service have restrictions on the number of records/size of data query can deal with? If yes, documentation link pls? If not,  is there a adobe's recommendation on the data size?

 

3. Performance guardrail (Soft limit) - When exceeding performance guardrails, you may experience performance degradation and latency. Can adobe define the standard limits for concurrent batch queries, when one can expect performance degradation and latency for 'n' number of concurrent batch queries?

We understand its a cloud tool and depends on the query resources utilization, however looking for documentation where for given standard 'resources' allocation, what is the query service performance behavior?

3 Replies

Avatar

Level 4

1. 'N' number of concurrent batch queries are supported as mentioned in the document above. Lets assume each record has 10-15 attributes and this is the only custom batch query running in the platform, If query has to handle 4 million records each day, what's the execution time normally taken by query? Does adobe has a documentation to understand the natively built query service performance?

 

In my experience, I have seen this take minutes to maybe 10s of minutes. Query Service can process 10s of billions of records over a span of 24 hours and we are talking multiple joins, aggregates etc. We are beginning to see some massive volumes come by and I intend to increase the threshold from 24 to maybe 72 hours. 

 

We are introducing a new feature where you will be able to get the compute hour statistics on a single run. I have to publish a blog on this but you should size the job based on executing the same query on a smaller sample. Extrapolate the compute hours (which will be a fraction of an hour) to the size of the entire dataset (# of records). 

 

So why is estimating compute hours so hard? When Query Service is doing batch compute, it is using multiple stages of batch computing where machines spin up and go down as needed to get the fastest response. The decisions to scale up and down are stochastic in nature as the nature of the data and compute interplay to decide how we can be most efficient. This is a step up from the traditional big data architectures where you spin up a cluster of a fixed size and you wish that none of the machines are sitting idle. 

 

So when things get stochastic, the best estimates have to be statistics or some AI/ML-based (on your data only, not of others). I am working to build that but I need the raw data first on th compute hours which I am launching in the next month or two.

 

Sampling doc:

https://experienceleague.adobe.com/en/docs/experience-platform/query/sql/syntax#analyze-table

 

Avatar

Level 4

2. As mentioned in the document Maximum execution time is 24 hours? Also Does query service have restrictions on the number of records/size of data query can deal with? If yes, documentation link pls? If not,  is there a adobe's recommendation on the data size?

 

There are no restrictions on the size of the data or the kinds of queries you can execute. We have been very focused on processing deeply nested data efficiently as that is the frontier of batch data processing for personalization.

 

The SQL syntax is this: https://experienceleague.adobe.com/en/docs/experience-platform/query/sql/syntax

Avatar

Level 4

3. Performance guardrail (Soft limit) - When exceeding performance guardrails, you may experience performance degradation and latency. Can adobe define the standard limits for concurrent batch queries, when one can expect performance degradation and latency for 'n' number of concurrent batch queries?

We understand its a cloud tool and depends on the query resources utilization, however looking for documentation where for given standard 'resources' allocation, what is the query service performance behavior?

 

This is not explained well unfortunately.  Query Service limits the number of sessions (users using a SQL client or the AEP UI) that you can run at any point in time. Typically, you buy more sessions or logins by getting additional packs. 

 

For these users, there is throttling in terms of making sure that no user can take over the system. But once you schedule a query, it bypasses all of these throttling and executes independently of others. There is no performance degradation. 

 

What is the recommendation here? If you have a lot of users using the system, make sure you have enough active logins to do so. Once you schedule a query, you can sit back and relax.