Adobe Experience Platform

DavidSlaw1 · 5/13/24

Does anyone have any example of ingesting a parquet file into AEP? Documentation and Adobe Support says the parquet file must exactly match the XDM schema in AEP. I am have a test file created to do just that. However, there has to be a way to ingest a parquet file and use a mapping set. I prefer to ingest data from client systems without requiring the client to create a specific format just for Adobe.

DavidSlaw1 · 6/13/24

Solved. There is no way to ingest non-xdm compliant parquet. I solved by making the parquet xdm compliant using a data pipeline, ensuring the datetime values are of a format AEP can ingest without errors, and sizing the data files to optimize load time. Parquet file loads 60% faster compared to csv load time.

View solution in original post

Tof_Jossic · 5/14/24

@DavidSlaw1 not sure if I've got this right but I assume you are possibly ingesting the data using drag and drop on the dataset UI page, that does not allow for any mapping options.

However most of the Cloud Storage options would let you select the parquet format in your dedicated repository and then go through the 'Mapping' step.

See Map data fields to an XDM schema

Let me know if that helps

DavidSlaw1 · 5/14/24

Does not help. selecting a parquet file from S3 is fine. No options to map in the UI workflow.

brekrut · 5/14/24

Hello @DavidSlaw1

When you create your mapping of the file there is an option to be xdm compliant or not. If the data is not xdm compliant then you should be able to create a mapping flow.

DavidSlaw1 · 5/14/24

Let me try that in the API.

DavidSlaw1 · 5/14/24

I did try this before. Job runs and result is success, but no records loaded. The Adobe docs lack clarity on configuring the mapping. Do I still use the ATTRIBUTE and EXPRESSION sourceType? Then source is the attribute name in the file and target is the fully qualified XDM path, like this?

"mappings": [

{

"sourceType": "ATTRIBUTE",

"source": "var1",

"destination": "_mytenant.fieldGroupObject.var1",

"identity": true,

"primaryIdentity": false,

"namespace" : "var1Namespace"

}

DavidSlaw1 · 5/15/24

Certainly no options in the UI. Not clear where specifying non-xdm is to be done in the API. Thoughts?

brekrut · 5/15/24

Hello @DavidSlaw1

Apologize in the detail in my response I had provided you the incorrect information.

When ingesting Parquet files they must be XDM compliant. There is no mapping step required if the data is in XDM compliant.

Apache Parquet: Parquet-formatted data files must be XDM-compliant.

https://experienceleague.adobe.com/en/docs/experience-platform/sources/ui-tutorials/dataflow/cloud-s...

If you are ingesting data in JSON or CSV then you can use the mapping step as this data may not be xdm compliant.

DavidSlaw1 · 5/15/24

There must be a way to ingest data from a parquet file that is not XDM compliant. It seems silly to ask a client to reformat / restructure data just for AEP to consume it.

brekrut · 5/16/24

At this time parquet is only XDM compliant. You have more flexibility if the data in in json to create a mapping flow.

DavidSlaw1 · 6/13/24

Solved. There is no way to ingest non-xdm compliant parquet. I solved by making the parquet xdm compliant using a data pipeline, ensuring the datetime values are of a format AEP can ingest without errors, and sizing the data files to optimize load time. Parquet file loads 60% faster compared to csv load time.

Adobe Experience Platform

Ingest parquet file example

Learn

Documentation

Community

Support

Resources

Adobe account

Adobe