Expand my Community achievements bar.

Using Dataset in Schema

Avatar

Level 3

Hi,

I have a client specification I would like to ask for the best practice advice. So the requirement is that I am only allowed to create one Experience Event Schema (This schema captures both Transaction Data and the Login Data). In terms of ingesting the data, which way is recommened amongst the two:
1. Create one single schema that captures both Login and Transaction data?

2. Create two separate schema, one for storing Login Data,and one for storing Transaction data?

Please advice onto which solution is the best practices, and both of these have any flaws and also any other alternatives would be greatly appreciated.


Best regards,
Sching

4 Replies

Avatar

Level 1

both ways are possible and good on its own terms. 

 

if the source and file is same where you get both the transaction data and login data in the same file, then go with 1 schema and 1 dataset.

 

if the data is coming from 2 different files either from the same source or different source, its fine either you create 1 schema and 1 dataset or 1 schema and 2 datasets or 2 schemas and 2 datasets. it depends on whether you want to keep the number of schemas limited and also depends on the data, usage of data in audience and usage of data for down streams like CJA. No harm in using 1 schema and 1 dataset.

 

--ssj

Avatar

Community Advisor

Hi @ChanuteJo  - I hope you are referring to the web SDK as a source to capture event data. I see most of the cases like this it is recommending to have a 1 schema and dataset, it would be flexible and easy to maintain for a longer run. Moreover, you would capture a same identity (Lets say Email or CRM ID) as a primary identifier to define the customer profile in AEP. So, there is no much difference. You also have a individual custom field group according to your login and transaction data (standard field group) that can diverse the attribute you use on the audience section and make clear communication of what to be used. Also, since both are related to events, I would prefer to go with 1 schema and dataset.

Thank you,
Jayakrishnaa P.

Avatar

Level 4

Hi, @ChanuteJo,

 

I recommend separating these into two schemas and two datasets. This approach offers several advantages: it simplifies debugging with SQL due to less complex schemas and smaller datasets, leading to faster query performance. More importantly, it allows you to set different Time-To-Live (TTL) values for each type of information. For example, transaction data might need to be retained within the CDP for a year, while login data might only be necessary for three months.

 

Generally, separating data as much as possible promotes better structure, easier debugging, and improved data governance.

Avatar

Community Advisor

@ChanuteJo  agrees with all the suggestions here, but @GigiCotruta  , as mentioned, makes more sense in the long run and provides flexibility. However, data separation needs depend on your current and future use cases as well. Both ways are good decide based use case