How can one re-ingest historic streaming event data?

Question

We ran into a faulty schema-design issue that we are trying to resolve in a client's PROD sandbox.

We have a profile-enabled experience-event dataset that contains about a year's worth of streaming event-data. This dataset is based on a profile-enabled schema that has 3 identities - ECID and 2 other identities, each mapped to different cross-device ID namespaces.

It has been determined that the 2 identities of the schema/dataset were mapped incorrectly to existing 2-cross-device ID namespaces. We are attempting to do the following:

1. Demote the 2 attributes that are currently assigned as identities and turn them into 'regular' attributes. The dataset is the only dataset that is derived from the parent schema. There are no other datasets based on the schema.

2. Assign 2 new/different attributes as identities on the schema. These new identities will be using 2 new namespaces - different from what the 2 original identities were using. ECID will remain the same - no changes will be made to it - it will be kept as the primary identity on the schema.

3. Ensure that the existing historical events within the dataset are re-purposed to use the new identities. The client wants to keep/maintain the existing data in the dataset (with the original timestamps). Existing streaming audiences use profiles that are based on profile fragments from this dataset.

4. Ensure that any data ingested into this dataset in the future is mapped to the new identities

The client does NOT have data distiller nor CJA. To make the schema changes, we have no choice but to delete the dataset first before updating the schema or to create a new schema with the changes and migrate the event data. My question is - what approach(es) can we use to accomplish #3? This dataset is the only place within Platform where this data exists. If it gets deleted, the data is gone forever. The client does not have the data and even if they did, they do not have any way to re-stream the years' worth of events back into the dataset.

Any suggestions on how best to accomplish this?

stephentmerkle · Accepted Answer

Thanks for the great suggestions. Ideally, exporting the dataset and re-ingesting would have worked best. We decided not to pursue this any further due to the limited timeline we had to get things going. One of the larger datasets had 635 million events and the number of files generated by the export would have been overwhelming and time-consuming to re-ingest even if we limited it to only export within a specific timeframe.

We deleted the datasets, made the changes, and resumed ingesting into them. Audiences that were based on events within the datasets had a max 30-day lookback period, so the call was made to proceed without doing this work and have the profiles build back up over time.

Harveer_SinghGi1 · Answer

Hi @stephentmerkle ,I'm not sure if there is an option to work this by processing the data within AEP without data distiller but you may explore the option to export dataset to export the entire dataset to a cloud location and then ingesting it back to an AEP dataset based on a schema with corrected identities. For re-ingestion you can either do file or batch upload or use HTTP API to stream it in.Cheers!

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded