Expand my Community achievements bar.

Join us on September 25th for a must-attend webinar featuring Adobe Experience Maker winner Anish Raul. Discover how leading enterprises are adopting AI into their workflows securely, responsibly, and at scale.

List all batch ids of a dataset based on Source Data flow run id

Avatar

Level 4

Hello Community,

I am working with a dataset in AEP and need to trace all the batch IDs that were generated during a particular ingestion. I can see the source Dataflow Run ID has ingested some records in the dataset and took 1.3 hours run but I see during this time multiple batch have been created in dataset.

 

I tried with APIs with the following ways: 

  1. Flow services run id the payload do not provide the dataset details
  2. Catalogue services batches with bath id, here as well the Data flow run details are not available

 

My requirement is:

  • Given a source Dataflow Run ID, I want to list all the batch IDs created in AEP.

What is the correct way (via API or Query Service) to fetch all batch IDs of a dataset based on a given Source Dataflow Run ID?

 

3 Replies

Avatar

Level 2

Hi @mustufam5967803 

 

Does your dataset receive data from multiple dataflows, and you’re trying to identify batchIds for each dataflow? As far as I know, this isn’t directly possible.

If you only need to see the different batchIds loading data into a dataset, you can query them through Query Service using _acp_system_metadata.acp_sourceBatchId in your SELECT statement.
Alternatively, you can use the Catalog Service API for batches: https://platform.adobe.io/data/foundation/catalog/batches.

If you specifically want to tie batches to individual dataflows, you’ll need a custom approach:

  • Update your schema to capture the dataflowName.

  • In each dataflow mapping, pass a unique name for the dataflowName .

  • Then, using Query Service, select both the dataflowName field and _acp_system_metadata.acp_sourceBatchId to see which batches came from which dataflow for a dataset.

 

Avatar

Level 4

We are using streaming connections, and the connection takes around 1.3 hours to run. During this duration, 2–3 batches get created.

With  _acp_system_metadata.acp_sourceBatchId, I tried this approach, but I am not sure what should be added in the WHERE condition to filter the data based on the Data Flow run ID. Also, could you let me know which system dataset contains this data?

It also gets difficult to trace back the batch ID of the past 2–3 days if you only have the Data Flow run ID.

Avatar

Level 3

If you look on the Datasets page for your dataset at the bottom you'll see Dataset Run IDs and their corresponding Batch IDs. For batch source connectors these are 1:1.