When a new field is introduced in the event schema after it already been enabled for the profile and have the data ingested, how do we ingest the historical data for newly introduce field ?
Some of the solution i could think of are,
option1) One approach I am ware is create a new dataSet with primary key and new field, and then use that dataset to ingest both historical and incremental data. however this approach may not be scalable. if we introduce new field every quarter, then it would lead to multiple datasets.
option2) Wipe out historical data from the dataset by deleting the batch which had ingested that historical data, and re-ingest the data with all attribute. this is time consuming, and can disturb existing prod setup.
option3) use data distiller to backfill the historical data. can that be done technically as event based schema class doesnt support upsert?
Are there any best recommended approach ?
Solved! Go to Solution.
Views
Replies
Total Likes
Dropping dataSet is not a recommended approach, as it impacts the CJA reporting which uses that dataset in CJA connection/View for reporting purpose.
So far the best option I have found which has minimum impact is deleting the batch.
Though it keeps the previous records in profile store, it gets appended with recent record of the same event when historical data is reloaded with all columns.
This approach doesnt break the CJA reporting, and doesnt create multiple dataSets
Views
Replies
Total Likes
the best option will be as below
1. add additional attribute in the schema
2. create new dataset from the same schema and profile enable it.
3. take back up of old dataset in non profile enabled dataset
4. ingest historical data, and schedule incremental data into the new dataset including data for additional attribute you have just added. but while ingesting historical data make sure you use a different _id.
5. drop the old dataset. only by dropping the dataset, data will be removed from profile store.
option 1 in your question will be having challenges as you mentioned having multiple datasets, and processing incremental data to multiple datasets and so on.
Option 2 wont work, as by deleting the batches it will only delete the data from data lake not from profile store
option 3 also not possible.
Views
Replies
Total Likes
Dropping dataSet is not a recommended approach, as it impacts the CJA reporting which uses that dataset in CJA connection/View for reporting purpose.
So far the best option I have found which has minimum impact is deleting the batch.
Though it keeps the previous records in profile store, it gets appended with recent record of the same event when historical data is reloaded with all columns.
This approach doesnt break the CJA reporting, and doesnt create multiple dataSets
Views
Replies
Total Likes
Views
Likes
Replies