@davidslaw1 how did you create XDM complaint parquet structure?
I create the XDM structure to receive the data
<profileSchemaName>
-> <tenant id> | Object
-> <profile data object> | Object
-> <list of attributes, their types, set identity parameters>
-> _repo | Object
-> createDate | DateTime
-> modifyDate | DateTime
-> _id | string <and the remaining standard profile record attributes>
Gave that structure to the database developers and asked them to create a parquet file using the XDM schema structure such as the following. Once that was done, the ingestion process was great! Parquet file on 1 million records ran 60% faster than csv file.:
Pseudocode
With the source parquet file object (pf)
Create a new parquet file object (pf_parquet_for_aep)
Containing a column “<tenant ID>”
Containing a structure “populationIdentityMap”
Containing all the attributes in the identity graph
Calculate an attribute “uuid” derived from a uuid() function (Note this is to be as the _id attribute value)
root
|-- _<tenant id>: struct (nullable = false)
| |-- populationIdentityMap: struct (nullable = false)
| | |-- ipv4_home: string (nullable = true)
| | |-- maid: string (nullable = true)
| | |-- email_sha1: string (nullable = true)
| | |-- email_md5: string (nullable = true)
| | |-- email_sha256: string (nullable = true)
| | |-- transactionId: string (nullable = true)
| | |-- timestampSampled: timestamp (nullable = true)
| | |-- timestampRetain: timestamp (nullable = true)
| | |-- batch_id: string (nullable = true)
| | |-- uuid: string (nullable = false)
Also, be sure to tell the developers any DateTime values need to be ISO8601 compliant and be sent as UTC timezone, which they may need to convert. AEP assumes UTC timezone as default.