How can one re-ingest historic streaming event data?
We ran into a faulty schema-design issue that we are trying to resolve in a client's PROD sandbox.
We have a profile-enabled experience-event dataset that contains about a year's worth of streaming event-data. This dataset is based on a profile-enabled schema that has 3 identities - ECID and 2 other identities, each mapped to different cross-device ID namespaces.
It has been determined that the 2 identities of the schema/dataset were mapped incorrectly to existing 2-cross-device ID namespaces. We are attempting to do the following:
1. Demote the 2 attributes that are currently assigned as identities and turn them into 'regular' attributes. The dataset is the only dataset that is derived from the parent schema. There are no other datasets based on the schema.
2. Assign 2 new/different attributes as identities on the schema. These new identities will be using 2 new namespaces - different from what the 2 original identities were using. ECID will remain the same - no changes will be made to it - it will be kept as the primary identity on the schema.
3. Ensure that the existing historical events within the dataset are re-purposed to use the new identities. The client wants to keep/maintain the existing data in the dataset (with the original timestamps). Existing streaming audiences use profiles that are based on profile fragments from this dataset.
4. Ensure that any data ingested into this dataset in the future is mapped to the new identities
The client does NOT have data distiller nor CJA. To make the schema changes, we have no choice but to delete the dataset first before updating the schema or to create a new schema with the changes and migrate the event data. My question is - what approach(es) can we use to accomplish #3? This dataset is the only place within Platform where this data exists. If it gets deleted, the data is gone forever. The client does not have the data and even if they did, they do not have any way to re-stream the years' worth of events back into the dataset.
Any suggestions on how best to accomplish this?