Nível 1
Nível 2
Faça login na Comunidade
Faça logon para exibir todas as medalhas
We ran into a faulty schema-design issue that we are trying to resolve in a client's PROD sandbox.
We have a profile-enabled experience-event dataset that contains about a year's worth of streaming event-data. This dataset is based on a profile-enabled schema that has 3 identities - ECID and 2 other identities, each mapped to different cross-device ID namespaces.
It has been determined that the 2 identities of the schema/dataset were mapped incorrectly to existing 2-cross-device ID namespaces. We are attempting to do the following:
1. Demote the 2 attributes that are currently assigned as identities and turn them into 'regular' attributes. The dataset is the only dataset that is derived from the parent schema. There are no other datasets based on the schema.
2. Assign 2 new/different attributes as identities on the schema. These new identities will be using 2 new namespaces - different from what the 2 original identities were using. ECID will remain the same - no changes will be made to it - it will be kept as the primary identity on the schema.
3. Ensure that the existing historical events within the dataset are re-purposed to use the new identities. The client wants to keep/maintain the existing data in the dataset (with the original timestamps). Existing streaming audiences use profiles that are based on profile fragments from this dataset.
4. Ensure that any data ingested into this dataset in the future is mapped to the new identities
The client does NOT have data distiller nor CJA. To make the schema changes, we have no choice but to delete the dataset first before updating the schema or to create a new schema with the changes and migrate the event data. My question is - what approach(es) can we use to accomplish #3? This dataset is the only place within Platform where this data exists. If it gets deleted, the data is gone forever. The client does not have the data and even if they did, they do not have any way to re-stream the years' worth of events back into the dataset.
Any suggestions on how best to accomplish this?
Solucionado! Ir para a Solução.
Os tópicos ajudam a categorizar o conteúdo da comunidade e aumentam sua capacidade de descobrir conteúdo relevante.
Visualizações
respostas
Total de curtidas
Thanks for the great suggestions. Ideally, exporting the dataset and re-ingesting would have worked best. We decided not to pursue this any further due to the limited timeline we had to get things going. One of the larger datasets had 635 million events and the number of files generated by the export would have been overwhelming and time-consuming to re-ingest even if we limited it to only export within a specific timeframe.
We deleted the datasets, made the changes, and resumed ingesting into them. Audiences that were based on events within the datasets had a max 30-day lookback period, so the call was made to proceed without doing this work and have the profiles build back up over time.
Visualizações
respostas
Total de curtidas
Hi @stephentmerkle ,
I'm not sure if there is an option to work this by processing the data within AEP without data distiller but you may explore the option to export dataset to export the entire dataset to a cloud location and then ingesting it back to an AEP dataset based on a schema with corrected identities. For re-ingestion you can either do file or batch upload or use HTTP API to stream it in.
Cheers!
Visualizações
respostas
Total de curtidas
If you are attempting to re-ingest data upon a new dataSet your best options could be the following:
1) Use the Data Access API to export data out of platform and then re-ingest data.
2) Using a dataSet export to a cloud based destination and then re-import the data back into AEP.
3) Ideally if the client did have Data Distiller the data could remain within the AEP datalake and you can write the data back onto a new dataSet without having to export the data.
Visualizações
respostas
Total de curtidas
Hi @stephentmerkle ,
Note: This response is partially inspired from Generative AI.
Visualizações
respostas
Total de curtidas
Thanks for the great suggestions. Ideally, exporting the dataset and re-ingesting would have worked best. We decided not to pursue this any further due to the limited timeline we had to get things going. One of the larger datasets had 635 million events and the number of files generated by the export would have been overwhelming and time-consuming to re-ingest even if we limited it to only export within a specific timeframe.
We deleted the datasets, made the changes, and resumed ingesting into them. Audiences that were based on events within the datasets had a max 30-day lookback period, so the call was made to proceed without doing this work and have the profiles build back up over time.
Visualizações
respostas
Total de curtidas
Visualizações
Curtida
respostas
Visualizações
Curtida
respostas