Real-Time Customer Data Platform

NimashaJain · 7/15/24

Hello Team,

Welcome to the Adobe Real-Time CDP Community Mentorship Program 2024! This is the featured Community Discussion/Contextual thread for your Adobe Real-Time CDP Community Mentor, Bitun Sen!

Bitun will be your dedicated mentor, providing valuable support and guidance on your Adobe Real-Time CDP queries as you upskill yourself and prepare for Real-Time CDP certification throughout the program.

Know your Mentor Bitun Sen (aka @bitunsen-2022 )

Bitun comes with 2 decades of hands-on experience in working with various master data management and data warehousing applications. Currently, he is involved in providing implementation guidance and support for various customer data platform-specific engagements. Throughout his career, he has trained many co-workers in various tools and technologies including Adobe Experience Platform.

He is looking forward to help others by sharing his knowledge and skills on Adobe Experience Platform as they upskill them selves.

Aspirants mapped to Bitun Sen

1) Dhanesh Sharma aka @dhanesh_sh
2) Kana Nguyen aka @KanaNg
3) Saurabh Channe aka @SaurabhCh
4) Ankit Agarwal aka @ankitagarwal05
5) Najlaa Heerahaka @najlaah
6) Sanchari Das aka @SD_11
7) Sanjay RJ aka @SanjayR_
😎 sathya murugaiyan aka @Sathya_Murugaiyan
9) Indra Kumar Reddy Madhuru aka @inder
10) pranathi priya valapa aka @pranathipriya

How to participate in the program

Post your Questions in this thread to connect with your Mentor, Bitun Sen, and fellow Aspirant peers.
Stand a chance to win the ‘Most Engaging Aspirant’ recognition from your mentor by participating in a weekly quiz.
Test your knowledge by replying to the unresolved questions in the Real-Time CDP and AEP community and tag your Mentor to get recognized as an ‘Exceptional Contributor’ by your mentor.
Stick to the schedule to cover one module/week and clear Adobe Real-Time CDP Certification during the program: July 15 – Aug 30

Suggested Next Steps for Aspirants:

Update your Community Profile photo with your latest headshot to stand out to your Mentor and Peer Aspirants.
"Like" this thread to confirm your participation in the program.
Introduce yourself to Bitun Sen and your Aspirant peers by Replying to this Thread! Break the ice by introducing yourself (location, org/ company, etc.) and your experience with/ interest in Adobe DX stack.
Post your Questions to this thread as you begin learning more about the Adobe Real-Time Customer Data Platform Developer Expert certification (Exam ID: AD0-E605)
Stick to schedule and ensure you track your progress in the exam prep guide.
Test your learning by replying to weekly quiz by your mentor
Practice the modules by replying to unresolved queries in the AEP community & RTCDP Community and tag your mentor.

Remember that every post / like / comment you make in your contextual thread and the Real-time CDP Community throughout the program helps increase your chance to be recognized by your Mentor and win exclusive Adobe swag, so bring your best efforts!

We wish you all the best as you embark on this learning experience!

Indra · 7/26/24

What are the allowed schema changes after marking the schema and dataset profile enabled ?

Allowed changes

Adding new fields to the schema/Dataset
Making a required field optional
Introducing new Required fields
Changing the resource display name and Description

Breaking changes/Not supported changes

Removing previously defined fields
Renaming and redefining fields
Removing or restricting previously supported fields values
Moving exiting fields to different location in the tree
Deleting the schema
Disabling the schema from participating in profile

Indra · 7/26/24

5. What are the 3 types of core entities we can have in AEP?

Profile entities - XDM Individual Profile class.
Event entities - XDM ExperienceEvent class.
Custom entities -custom classes

Indra · 7/26/24

4.What are the required fields you have for any experience event schema ?

_id

Timestamp

Bitun · 7/29/24

Team,

Let us start going through the RT-CDP and Data Ingestion in this week. Below are few important links fro reference:

For RT-CDP, please understand

the definition of a profile and how it constructs
information about profile fragments - https://experienceleague.adobe.com/en/docs/experience-platform/profile/home#profile-fragments-vs-mer...
identity namespaces and identity graph
Upsert functionality - https://experienceleague.adobe.com/en/docs/experience-platform/catalog/datasets/enable-upsert

Specifically for data ingestion, please go through the various types of sources we have. Try to understand

what type of data they support
whether they support batch or streaming
understand available metrics after data ingestion

I would love to talk to you all before Friday to see if you have any questions about Data Architecture, RT-CDP and Data Ingestion. Would like to know if you all are Ok to meet at 6:00 PM EST on Wednesday. Please let me know, I will schedule it.

Happy Learning!!!!

Indra · 7/31/24

Hi @Bitun

What is Mapping set ID, in what scenario we create it ? where can we see in UI mapping details ?

As per the documents every dataflow will have Mapping set ID , but I don't see mapping set id for all dataflow

"Mapping set

A set of mappings that transform one schema to another are collectively known as a mapping set. A single mapping set is created as part of each data flow. A mapping set is an integral part of the data flows and is created, edited, and monitored as part of the data flows.

"

Bitun · 8/2/24

We had discussed this in our last call on Wednesday - hope you understood it. If not, we can discuss this again in today's call.

Bitun · 8/2/24

@Indra @SaurabhCh @DhaneshSh - Hope to talk to you in our today's session.

Due to some ongoing project issues, I could not post the questions on RT-CDP and Data Ingestion. Please follow this channel - I will be posting the questions by Sunday (August 4th).

Bitun · 8/3/24

As always, it was great connecting to you guys! As discussed over the call, below are some important information and links you all need to focus:

1. Upsert - https://experienceleague.adobe.com/en/docs/experience-platform/data-prep/upserts

Please note: if you are making any change using UPSERT, that updated data does not flow into Dataset residing in the datalake (please refer to the above picture)

2. Ingest CSV data using Data Ingestion API: https://experienceleague.adobe.com/docs/experience-platform/ingestion/batch/api-overview.html?lang=e...

3. Various batch ingestion troubleshooting points: https://experienceleague.adobe.com/en/docs/experience-platform/ingestion/batch/troubleshooting

4. Various Data Prep functions: https://experienceleague.adobe.com/en/docs/experience-platform/data-prep/home

5. Edge profile vs Hub Profile - how data flows to which data store and when - https://experienceleague.adobe.com/en/docs/experience-platform/profile/edge-profiles

As discussed, I will setup another sync-up call on Wednesday (August 7th) at 6:00 PM EST.

Happy Learning!!!!

Bitun · 8/3/24

Everybody - we had great sessions for last couple of weeks - where we all talked about what we had learnt along with various real-life challenges we face while working with AEP - I can see active participation only from 3 participants:

1) Dhanesh Sharma aka @dhanesh_sh

2) Saurabh Channe aka @SaurabhCh

3) Indra Kumar Reddy Madhuru aka @inder

Its a great way of building the network and sharing knowledge and experiences. Requesting others to join as well so that we can learn from your experience as well -

1) Kana Nguyen aka @KanaNg
2) Ankit Agarwal aka @ankitagarwal05
3) Najlaa Heerahaka @najlaah
4) Sanchari Das aka @SD_11
5) Sanjay RJ aka @SanjayR_
😎6) sathya murugaiyan aka @Sathya_Murugaiyan
7) pranathi priya valapa aka @pranathipriya

Indra · 8/4/24

Hi Bitun,

How much Time it takes to update the dataset if i ingest data using Streaming API. For me it took almost 30 mins to see data in Dataset.

is this expected ?

Do we have any SLA for Streaming data ingestion?

I see batch API updated immediately with in few seconds.

Bitun · 8/6/24

@Indra - I need to find out the Experience League document which talks about this. Usually, what I have seen is, if you are ingesting data through Streaming (e.g. HTTP API), it talks 15-20 minutes to be seen in datalake (using Query Service).

Sometimes, in very rare occasions, I had seen records getting available in datalake after 30 minutes.

SaurabhCh · 8/6/24

@Indra

The delay you experienced when ingesting data using the Streaming API in Adobe Experience Platform (AEP) is not unusual. Typically, data ingested via Streaming API is processed in near real-time, but the visibility of this data in the data lake can take around 15-20 minutes. However, it can sometimes take up to 30 minutes under certain conditions.

This delay happens because while data is streamed into the Real-Time Customer Profile almost immediately, it is then batched and sent to the data lake every 15 minutes. Therefore, there is a slight delay before the data is available for querying or other processing within the data lake.

Regarding SLAs, Adobe doesn’t publicly document specific SLAs for streaming data ingestion times, but it is generally expected that streaming data should be processed quickly and be available within the timeframe you observed.

For more detailed information, you can refer to Adobe’s Experience League documentation on data ingestion processes (Experience League | Adobe) (Experience League | Adobe) (Experience League | Adobe).

Bitun · 8/7/24

Here you go: https://experienceleague.adobe.com/en/docs/blueprints-learn/architecture/architecture-overview/platf...

Bitun · 8/6/24

Team,

Please try to answer these questions:

Describe the functionality of Backfill which comes out of the box for various types of data sources.
A data engineer has setup a dataflow using AWS S3 connector pointing to a S3 bucket. Everyday a CSV file is copied to that S3 bucket. After successful execution of the dataflow (with Backfill enabled) for couple of days, the data engineer disabled the dataflow and then again enabled the dataflow after 5 days. What behavior will the data engineer observe - will it pickup all the files present in the S3 bucket? If so, then why?
A data engineer noticed that his is receiving CSV file containing order details of customers in S3 bucket. In that, one column is purchased datetime which is in "MM-DD-yyyy HH24:mm:ss" format. Which data prep function will be needed to convert this date to 'yyyy-MM-DDTHH:mm:ss.SSSSS" format: dformat or format ?
A data engineer has received a CSV file containing loyalty details of customers. In that file, there is a field "TIER" which has various values - Gold, Platinum and some other values. He got an instruction to transform the value "Gold" to "Tier 1", "Platinum" to "Tier 3" and other values to "Tier 2". How can you do that using iif and decode functions?
You have ingested data using batch in a dataset. Which API you should use to get the metadata ?
- Catalog API
- Data Access API
- Data Ingestion API

Indra · 8/7/24

https://experienceleague.adobe.com/en/docs/experience-platform/ingestion/quality/streaming-validatio....

Bitun · 8/9/24

Team,

Please try to answer these questions. If you face any challenge in any of the topics we have covered, please feel free to reach out to me by any means.

Thanks,

Bitun

Indra · 8/11/24

You have ingested data using batch in a dataset. Which API you should use to get the metadata ? Under Catalog API

Indra · 8/11/24

A data engineer has setup a dataflow using AWS S3 connector pointing to a S3 bucket. Everyday a CSV file is copied to that S3 bucket. After successful execution of the dataflow for couple of days, the data engineer disabled the dataflow and then again enabled the dataflow after 5 days. What behavior will the data engineer observe - will it pickup all the files present in the S3 bucket? If so, then why?

System will process the all the files present in s3 bucket if backfill is disabled.
loads the Files between first data ingestion and Start time of data flow

Indra · 8/11/24

Describe the functionality of Backfill which comes out of the box for various types of data sources

Backfill determines what data is initially ingested. If backfill is enabled, all current files in the specified path will be ingested during the first scheduled ingestion. If backfill is disabled, only the files that are loaded in between the first run of ingestion and the start time will be ingested. Files loaded prior to the start time will not be ingested.

Interval and backfill are not visible during a one-time ingestion.

Indra · 8/11/24

A data engineer noticed that his is receiving CSV file containing order details of customers in S3 bucket. In that, one column is purchased datetime which is in "MM-DD-yyyy HH24:mm:ss" format. Which data prep function will be needed to convert this date to 'yyyy-MM-DDTHH:mm:ss.SSSSS" format: dformat or format ? i think Format function should be used here, Function used to convert timestamp to date string according to specified format.

SaurabhCh · 8/12/24

1. Describe the functionality of Backfill which comes out of the box for various types of data sources.

Backfill refers to the process of processing historical data that was ingested into a data source prior to the initial setup of a dataflow. When Backfill is enabled, the system will automatically pick up and process all available data from the start, ensuring that the dataflow catches up with any historical data that may not have been processed during the regular data ingestion process.

2. What behavior will the data engineer observe - will it pickup all the files present in the S3 bucket? If so, then why?

Yes, the data engineer will observe that the dataflow picks up all the files present in the S3 bucket. This happens because Backfill is enabled, which means that when the dataflow is re-enabled after being disabled, it will process all the files that were added to the S3 bucket during the period it was disabled. The Backfill functionality ensures that no data is missed by processing all unprocessed files.

3. Which data prep function will be needed to convert this date to 'yyyy-MM-DDTHH:mm.SSSSS" format: dformat or format?

To convert the date from "MM-DD-yyyy HH24:mm" to 'yyyy-MM-DDTHH:mm.SSSSS" format, you would use the dformat function. The dformat function is used to convert date and time values from one format to another, whereas the format function is generally used for string formatting.

4. How can you do that using iif and decode functions?

To transform the values in the "TIER" field using iif and decode functions, you can use the following approach:

iif( TIER = 'Gold', 'Tier 1', iif( TIER = 'Platinum', 'Tier 3', 'Tier 2' ) )

Alternatively, using the decode function, you could write:

decode(TIER, 'Gold', 'Tier 1', 'Platinum', 'Tier 3', 'Tier 2')

This logic checks the value of the "TIER" field and transforms it accordingly.

5. You have ingested data using batch in a dataset. Which API should you use to get the metadata?

To get the metadata of ingested data using batch, you should use the Catalog API. The Catalog API provides metadata and catalog information for datasets, including schema, lineage, and other relevant information.

@NimashaJain @Bitun