Hi everyone,
I’m facing a challenge I haven’t been able to solve yet.
I have an Experience Event dataset that contains multiple rows per ID, and I need to transform it into a Profile dataset by aggregating all rows for a given ID into a single row, using arrays to store the values.
Here’s an example of the schema:
Experience Event Dataset
_entity.fg.ID _entity.fg.name _entity.fg.score _entity.fg.normalizedScore _entity.fg.version Timestamp
Target Profile Dataset
_entity.ID _entity.fg[].name _entity.fg[].score _entity.fg[].normalizedScore _entity.fg[].version extSourceSystemAudit.lastUpdatedDate
I’m trying to write a SQL query that would group all rows by ID and transform the values into arrays, but I haven’t found a working solution yet.
Has anyone faced a similar use case or could share an example of how to achieve this transformation using SQL in AEP?
Thanks in advance,
Regards,
Samy
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
@SamyPe1
you can achieve this at high level in Query service by first group by ID (use partition function or subquery with group by) + build a struct per row + collect into an array.
Refer to details query examples here https://experienceleague.adobe.com/en/docs/experience-platform/query/key-concepts/nested-data-struct...
Views
Replies
Total Likes
How will you prevent your arrays and the resulting profile records from growing too large? There are multiple limits applicable here:
https://experienceleague.adobe.com/en/docs/experience-platform/profile/guardrails#data-model-limits
Would you be better served with custom aggregates instead, e.g. first, latest, count, min, max, etc.?
Views
Replies
Total Likes
Hi,
Thanks for your answers! I found the solution thanks to Adobe Support’s AI, which suggested using ARRAY_AGG instead of ARRAY, along with adding a GROUP BY on the ID.
Interestingly, ARRAY_AGG isn’t mentioned in Adobe’s documentation, but fortunately, it works!
Here’s the query structure needed to perform this kind of aggregation:
INSERT INTO target_profile_dataset
SELECT DISTINCT
STRUCT(
eed._entity.fg.ID as ID,
ARRAY_AGG(STRUCT(
eed._entity.fg.name as name,
eed._entity.fg.normalizedScore as normalizedScore,
eed._entity.fg.score as score,
eed._entity.fg.version as version
)) As fgd
) As _entity,
STRUCT(MAX(timestamp) as lastUpdatedDate) as extSourceSystemAudit
FROM experience_event_dataset as eed
GROUP BY eed._entity.fg.ID
Views
Replies
Total Likes
Views
Likes
Replies