Updates to Record dataset showing multiple entries in Queries console

Question

Hi,

I have a Record behavior dataset created as a lookup for CJA. The schema is simple, just the default "_id" field as the article ID, the article name and author name.

After the initial data upload, we found that there're some mistake on the author name field and thus we have to upload certain records again e.g.

Initial upload - (_id:123, name:book1, author:john)

Record update upload - (_id:123, name:book1, author:paul)

I was expecting the record to be updated based on the _id:123. However, when tried to use Queries for debugging, with the SQL "select * from <lookup_table_name> where _id = 123, it shows both records in the result window.

Anyone can share some insights on this?

Thanks,

John

sreeCharan73 · Accepted Answer

I guess that is the design, you could think of the AEP data lake similar to all data lakes, where the data is always appended to the dataset, and not updated.
So, we also have a track of data ingested/changes happened, against the dataset.

And query service works as a simple Query layer on the datalake, we could see all the data. However CJA and RTCDP are intelligently picking the latest data on the record datasets, based on the _id.
If the idea is to have only one time ingestion or append to the lookup dataset, it is always advised to drop the dataset and do an ingestion.

arpan-garg · Answer

Hi @john_man - This is how AEP works, setting a _id as a primary identity does not mean that the old record will be overwritten when a new entry is added, it will still stay in the data lake. When you will query using _id you will see both the records.

One solution could be to also use timestamp field and fetch the entry with the latest timestamp.

However, In profile store when you search for a profile using the identity namespace it will only give you the latest record based on the timestamp.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded