Expand my Community achievements bar.

Adobe Experience Platform (AEP) & Apps User Groups are live to Network, learn, and share in your regional locations.

SQL: Grouping Experience Events into arrays for Profile dataset in AEP

Avatar

Level 1

Hi everyone,

 

I’m facing a challenge I haven’t been able to solve yet.

 

I have an Experience Event dataset that contains multiple rows per ID, and I need to transform it into a Profile dataset by aggregating all rows for a given ID into a single row, using arrays to store the values.

 

Here’s an example of the schema:

Experience Event Dataset

_entity.fg.ID  
_entity.fg.name  
_entity.fg.score  
_entity.fg.normalizedScore  
_entity.fg.version  
Timestamp  

Target Profile Dataset

_entity.ID  
_entity.fg[].name  
_entity.fg[].score  
_entity.fg[].normalizedScore  
_entity.fg[].version  
extSourceSystemAudit.lastUpdatedDate  

I’m trying to write a SQL query that would group all rows by ID and transform the values into arrays, but I haven’t found a working solution yet.

 

Has anyone faced a similar use case or could share an example of how to achieve this transformation using SQL in AEP?

 

Thanks in advance,

Regards,

Samy

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

3 Replies

Avatar

Level 4

@SamyPe1 

you can achieve this at high level in Query service by first group by ID (use partition function or subquery with group by) + build a struct per row + collect into an array.

Refer to details query examples here https://experienceleague.adobe.com/en/docs/experience-platform/query/key-concepts/nested-data-struct... 

Avatar

Level 3

How will you prevent your arrays and the resulting profile records from growing too large? There are multiple limits applicable here:

 

https://experienceleague.adobe.com/en/docs/experience-platform/profile/guardrails#data-model-limits

 

Would you be better served with custom aggregates instead, e.g. first, latest, count, min, max, etc.?

Avatar

Level 1

Hi,

Thanks for your answers! I found the solution thanks to Adobe Support’s AI, which suggested using ARRAY_AGG instead of ARRAY, along with adding a GROUP BY on the ID.

Interestingly, ARRAY_AGG isn’t mentioned in Adobe’s documentation, but fortunately, it works!

Here’s the query structure needed to perform this kind of aggregation:

 

INSERT INTO  target_profile_dataset
SELECT DISTINCT
    STRUCT(
          eed._entity.fg.ID as ID,
       ARRAY_AGG(STRUCT(
                              eed._entity.fg.name as name,
                              eed._entity.fg.normalizedScore as normalizedScore,
                              eed._entity.fg.score as score,
                              eed._entity.fg.version as version
        )) As fgd
    ) As _entity,
STRUCT(MAX(timestamp) as lastUpdatedDate) as extSourceSystemAudit
FROM experience_event_dataset as eed
GROUP BY eed._entity.fg.ID