Expand my Community achievements bar.

Best approach to update a target field referencing a source field

Avatar

Level 2

Hi Everyone,

 

We have csv.pgp encrypted source files stored in Azure Blob Storage. These files are ingested into the Data Landing Zone using an API-based ingestion process.

Our source and target schemas are different. We would like to reference a field from the source file (e.g., CRMID) and use it to update an existing attribute in the target schema (customer_status). However, the customer_status attribute does not exist in the source schema/source file.

For example: If CRMID = 12345 (from the source file), then update the customer_status=moved in the target.

Could you please advise on the best approach to implement this logic? Should we use Data Prep or Data Distiller after ingestion? Also, what steps or actions should we take to achieve this?

Additional questions:

  • CRMID is present in both the source and target schemas. Should this be the primary identifier in both schemas?
  • Do we need to add customer_status field in the source? Is it necessary?
  • If the customer_status field currently has the value "Active" in the target, when updated to “Moved “ based on the source input, the "Moved" value should be retained only for 90 days.

Thanks!

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

3 Replies

Avatar

Level 6

Hi @AEPuser16 ,

Include crmId (type: string) as an identity field.

Add customer_status (type: string) as a profile attribute.

Add a date-typed field status_updated_date if you want to record when the status was set.

Use crmId as your primary identity namespace in both source and target schemas. This ensures incoming records automatically merge to the correct profile based on CRMID.

Answer to your question: Yes - map CRMID as the primary identifier in both schemas. It gives you one consistent key for merges.

Instead of modifying the source CSVs themselves, leverage AEP’s Data Prep within your batch ingestion Flow Service:

Set up your batch ingestion

map your incoming CRMID column to the XDM crmId field.

create a new column customer_status that’s a constant formula: IF(crmId IS NOT NULL, "Moved", null)

set status_updated_date = NOW() so you know when the update occurred.

Reference: https://experienceleague.adobe.com/en/docs/experience-platform/data-prep/home

In your ingestion mapping, bind the Data Prep outputs (crmId, customer_status, status_updated_date) to the corresponding schema fields.

Publish the Flow. Every time your API ingests new batches, these transformations run automatically - updating the profile store with customer_status = "Moved"

AEP lets you apply row-level TTL only on event datasets, not on individual profile attributes. To achieve a 90-day roll-off of customer_status, you can do below:

Capture status_updated_date in ingestion

Create a scheduled Data Flow (via the Batch or Streaming API) that runs daily and queries all profiles where customer_status = "Moved" AND status_updated_date <= (currentDate – 90 days)

Updates customer_status back to "Active" (or null) for those records.

CRMID is present in both the source and target schemas. Should this be the primary identifier in both schemas?

Yes. Use CRMID as the primary identity namespace in your XDM schema and in the ingestion mapping so that records merge correctly in RT CDP.

Do we need to add customer_status field in the source? Is it necessary?

No. Use a Data Prep recipe to derive and inject that field at ingestion time - you can set it to "Moved" whenever a CRMID is present.

If the customer_status field currently has the value "Active" in the target, when updated to “Moved “ based on the source input, the "Moved" value should be retained only for 90 days.

Best practice is to capture a status_updated_date on ingestion and run a scheduled Data Flow (or Query + Prep recipe) that resets the field after 90 days. TTL at dataset level is less precise.

Note: This response is inspired from Generative AI.

 

Thanks

Ankit

 

Avatar

Level 2

Hi  @AnkitJasani29 

 

Thanks for the information. I was able to create a static value in the dataset using the static source type in the mapping set:{
"sourceType": "STATIC",
"source": "Moved",
"destination": "abc.xyz.customer_status"
}

Now, we will have both 'active' and 'moved' values for customer_status in the profile store, and based on the merge policy, they will be updated accordingly for each crmId. I also have the status_updated_date field in the profile snapshot export dataset.

My question is: How can I delete the profiles where customer_status = 'moved' and the status_updated_date is more than 90 days old in the profile store?
This deletion should happen incrementally every 90 days. Which deletion process is appropriate in this case?

 

Thanks!

Avatar

Level 6

Hi @AEPuser16 ,

Well not sure but you can export records through Query Service based on customer_status and status_updated_date. By using delete API endpoint you can delete profiles.

Please refer below reference links which may help you for deleting records.

  1. https://experienceleaguecommunities.adobe.com/t5/adobe-experience-platform/delete-profiles-from-aep/...
  2. https://experienceleague.adobe.com/en/docs/platform-learn/tutorials/profiles/delete-profiles

Thanks

Ankit