Introduction
For organizations using Adobe Customer Journey Analytics (CJA), understanding the data processing and ingestion process from the Adobe Experience Platform (AEP) data lake to CJA is crucial. This is because CJA does not work with a fixed data size, which means Adobe cannot provide a standard time frame for data ingestion. As a result, users should understand the factors that can affect the queue status, data processing speed, and any deviations from the expected timeline. With this knowledge, CJA users can plan their data management strategies more effectively, ensuring efficient data ingestion and maximizing the benefits of using CJA.
Note: The latest ingestion prioritization and latency guidance for CJA as of June 12th, 2024, is as follows:Processing & Backfill Ingest Review
Typically, live data events are processed and integrated into CJA within 90 minutes of becoming available on the AEP. However, the processing might take longer if the batch size exceeds 50 million rows. For information on latencies in Adobe cross-solution data flows, please refer to the CJA Guardrails documentation.
Since CJA relies on data first being in AEP, any upstream issues could affect CJA ingestion. It can be helpful to check the status of the dataset/batches in AEP to ensure the expected data is accessible, especially for Field-Based Stitching (FBS) involved datasets. FBS Data is first received into the main dataset, then copied into the stitched dataset, and finally into CJA. Small backfills usually take up to seven days to ingest, while large backfills can take up to 30 days.
Additionally, remember the rolling retention settings for the Connection, which establish cut-off periods for all data fed into CJA. For instance, if you enforce a rolling 13-month retention policy, any backfilled or ingested live data beyond the past 13 months window will not be incorporated into CJA. This is due to non-compliance with the Connection's set data retention policy.
When setting up a Connection, it's crucial to consider the Connection's size, as indicated by the 'Average number of daily events' dropdown. This data setting size configuration directly affects the resources allocated for data ingestion, which can impact its efficiency and effectiveness. If not properly calibrated, it could cause complications or issues with data ingestion.
Figure 1: Customer Journey Analytics: Create new Connection settings and Data settings screen
Furthermore, it's crucial to maintain a balance between your CJA profile datasets and the number of events, as they are sharded together in CJA. Failing to keep this balance could cause inefficiencies. It may result in needing a larger-than-necessary Connection to support the profile dataset, leading to wastage. This is particularly the case if the dataset includes profile Person IDs that aren't in your events datasets.
The CJA ingest queue status, indicated by a timestamp, can help you understand this process. Whether the data is live-streamed or processed by backfill ingest, it's classified as backfill based on the event timestamps alone.
To summarize Adobe's data processing and ingest methods for CJA, which may involve a streaming or backfill mechanism based on event data timestamps:
- Event data with timestamps less than 24 hours old are given priority streaming into the system, enabling swift access for analytics and decision-making in CJA.
- Event data older than 24 hours, even if it's part of the same batch as newer data, is classified as backfill and split into a separate batch. This backfill classification status applies to all events within the designated batch with timestamps that exceed the 24-hour mark. This data is ingested at a lower priority to manage system resources efficiently.
Connection Add Dataset Ingest Options
When you are in the process of incorporating a new dataset into a Connection within CJA, there are two primary options that you need to give careful consideration to, as they are directly related to how data is ingested. These options, 'Import new data' and 'Dataset backfill' carry substantial importance as they will dictate the manner in which the new data is ingested into CJA. 'Import new data' pertains to the introduction of entirely new batches of data, while 'Dataset backfill' involves filling historical batch gaps in the existing data either on an all-time basis or in targeted date range fashion. The selection between these two options requires an understanding of the data you have and the needs of your CJA instance. By choosing the most appropriate options, you can ensure a smooth and effective data ingest process within CJA.
Import new data: Any new batches added to the Experience Platform dataset will automatically be included in this Connection, allowing them to be readily available for analysis in CJA. This feature ensures all new data for the dataset is ingested into CJA correctly. However, keep in mind that newly received data may contain timestamps older than 24 hours. This is common with Mobile App tracking due to the use of offline to latent online collection methods. As per the guidelines, events with timestamps older than 24 hours are ingested at a lower queue priority than data with timestamps less than 24 hours old, and are often treated as backfill data flows. Events with timestamps less than 24 hours old receive streaming ingest priority, even if the batch includes some backfill classified events.
Figure 2: Connection Add Dataset Pre Selection View: Import new data, Dataset backfill, and Request backfill options
Figure 3: Connection Add Dataset Post Selection View: Import new data, Dataset backfill
Dataset backfill: This feature allows you to backfill all existing historical data from the Experience Platform for a specific dataset in the given Connection. It's useful if you need to include data from a time before the dataset's most recent ingestion into CJA. This feature helps fill data history gaps, providing a more comprehensive view of your data trends.
You can target a specific backfill date range, allowing you to concentrate on the time period most relevant to your data analysis needs. If you need data from a time before the latest dataset ingestion into CJA, select this option. After choosing this option, you can decide whether to backfill data for all time or within a specific timeframe. This flexibility enables you to customize your data backfill according to your unique requirements, delivering the most relevant data for your analysis.
Remember, due to the structure of data storage across a distributed network, backfilling progress may not follow a particular sequence. This means the data might not refill in the order you anticipate and its progression cannot be predicted. To ascertain if the backfilling process is complete, it's recommended to use the Query Service. This tool enables you to compare the data now in AEP with that in CJA. This comparison will clearly show whether the backfilling process is finished or if there are still gaps that need filling.
Figure 4: Connection Add Dataset Post Selection View: Request backfill for targeted date range backfill selection
Conclusion
Gaining a comprehensive understanding of how AEP interplays with CJA via processing and data ingestion is critical when you're designing your data management strategies. This overall knowledge allows you to understand the underlying mechanisms, assuring that your data inflow approaches are carried out in an efficient manner.
By obtaining a more lucid picture of the new data and backfill data flows, you will be able to identify potential trouble spots or areas that might need specific focus. This could be instrumental in improving the overall performance of your data management system into CJA. Moreover, it can lead to a significant increase in the effectiveness of your CJA instance, yielding better time to insights and higher data throughput efficiency.
In addition, a thorough understanding of these interactions can lead to more proactive planning. You can anticipate potential issues before they arise and devise strategies to address them. This proactive approach can lead to smoother operations, reduced downtime, and ultimately, a more robust and reliable data management system for CJA. Thus, investing time and effort into understanding the data flows between AEP and CJA can pay significant dividends in the long run.