Expand my Community achievements bar.

SOLVED

Seeing negative values for Integer datatype in AEP CDP

Avatar

Level 2

Hi,

 

I have a schema in prod and there is an ID field in it(like CRMID for example). The source datatype given to us in INT64 and I have used 'integer' datatype in AEP. After doing data ingestion, I see few negative values and few positive values and all values look incorrect compared to source values for this ID. Example 723456789123 is the value in source file and I'm seeing -1234567890 and 1234567890 values in AEP after data ingestion.The mapping is done correctly and I don't see any errors in my dataflow. What could be the reason for this? Any suggestions how to proceed with this will be helpful. 

 

Also, for our use case, we have not enabled Schema and Dataset for profile. So, incase if I update datatype, what are the next steps I can take, to get the old incorrect data also corrected? Or first to just test this process, how can I copy/ duplicate schema. 

 

Regards,

Ramyasri 

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Level 5

That sounds like it may be experiencing an overflow issue when it's converting it into the 32 bit integer type, would it be possible to adjust the Schema to use a Long or a String instead? Both would be a better match for a CRMID. You will need to do a backfill of this data into the new dataset, as the data you digested is now incorrect.

If you've already ingested data into the dataset/schema, this would be considered a breaking change and you'll need to remake the Schema to update the data type and subsequently create a new dataset as well.

Let me know if you have any other questions!
Tyler Krause

View solution in original post

9 Replies

Avatar

Correct answer by
Level 5

That sounds like it may be experiencing an overflow issue when it's converting it into the 32 bit integer type, would it be possible to adjust the Schema to use a Long or a String instead? Both would be a better match for a CRMID. You will need to do a backfill of this data into the new dataset, as the data you digested is now incorrect.

If you've already ingested data into the dataset/schema, this would be considered a breaking change and you'll need to remake the Schema to update the data type and subsequently create a new dataset as well.

Let me know if you have any other questions!
Tyler Krause

Avatar

Level 6

Hi @AEPuser16 ,

723456789123 - far exceeds this limit, leading to data corruption or overflow, hence the negative/incorrect values you are seeing.

In your AEP schema, you used the 'integer' data type, which maps to a 32-bit signed integer in many systems, including Adobe's internal mapping logic.

Reference:

https://experienceleague.adobe.com/en/docs/experience-platform/xdm/schema/field-constraints

 

Go to Schemas > Your Schema > Edit

Locate the field and change the type from integer to long

Save the schema

 

Go to Sources > Your Dataflow > Edit Mapping

Make sure field is mapped as int64 (long) in AEP

Save and re-enable the dataflow

 

If you want to correct old data then re-ingest corrected values using same keys or create new dataset

Thanks

Ankit

Avatar

Level 2

Thank you @AnkitJasani29  and @TylerKrause  for your answers!

From both the responses I understand that I need to change the datatype from Integer to either long or string datatype in the schema.If schema is not that complex I can change the datatype in UI directly correct?

I have a dataflow setup via API. We have source data files stored in Azure Storage Explorer. These are incremental data files, a new file is received every day, and each file is retained in Azure Blob Storage for 7 days before it gets deleted. 

So, In this case, what I understood is,

  • Disable and delete the existing dataset which has incorrect data and create a new dataset with updated schema.
  • Create a new incremental dataflow via API which points to a new dataset and updated schema.
  • Re-ingest the deleted files/older data again if needed, to have correct data.
  • Since this via API, I don't need to change any datatype in the mapping set correct? Please see attached screenshot.

AEPuser16_1-1747854288427.png

 

Let me know if my understanding or process that I'm going to follow is correct or needs any changes.

 

Thank you!

Avatar

Level 5

If you've already loaded data to the dataset built off the schema you won't be able to edit the existing schema via the UI. You can make a new one via the UI or follow the steps I outlined to create the new one and append the new field within the schema.

All your other bullets are correct, but make sure your new mapping set is consistent with your new schema (updating the value of int into long/string).

Tyler Krause

Avatar

Level 2

Hi  @TylerKrause ,

 

Yes, data is already loaded in the dataset, but as I'm saying it's incorrect data and I'm going to delete it and create a new dataset. So in this is case I can just update Schema in UI correct? and then create a new dataset pointing to updated schema.

If at all if I need the old data, I can later do re-ingestion right?

Let me know your thoughts!

 

Thanks!

Avatar

Level 5

Hey @AEPuser16 - you're missing on a core concept. Once data has been digested through a dataset that has been derived from a schema, you won't be able to update the data type in the schema. Even if you delete the dataset, you will not be able to update the schema data type and will need to build a new schema.

 

You can load the new data back at any point, but you will need to do so throughout the updated schema. 

 

Let me know if you have any other questions!

 

Tyler Krause

Avatar

Level 5

Hey @AEPuser16 - just checking in! 

Hope this was helpful. The steps for the Schema Registry APIs look complex but are really quite straightforward! Let me know if you need a hand walking through those steps or if there's anything else you need to help accomplish these tasks to get this all set in your environment!

Best,
Tyler Krause

Avatar

Level 2

Hi @TylerKrause ,

 

Gotcha! Thanks for the information.Once the schema is created and enabled for profile we can only modify few things. So, changing the datatype is one thing that we cannot do after schema is created. 

But I did not enable the schema and dataset yet. What happens in this case?

Yes,I pretty much work with postman. Some steps to get and create a new schema using API's will be helpful. 

 

Thanks!

Avatar

Level 5

Regardless of profile enablement status - if data has been digested through the dataset built off the schema, you won't be able to update it. 

 

The documentation I linked above for the APIs is super straightforward - but let me know if you have any questions!