During batch ingestion how to skip records based on specific conditions | Community
Skip to main content
Level 2
May 10, 2024
Solved

During batch ingestion how to skip records based on specific conditions

  • May 10, 2024
  • 1 reply
  • 965 views

Hi All,

 

Need your help. I want to load csv file into CDP using AWS S3 connector. The requirement is to ingest all data except where brand = 'XYZ'. So, can you please help me how can I skip those records?

Best answer by abhinavbalooni

Hey @shubham10 

There is one more approach to it.

 

As @brekrut mentioned, skipping of records during ingestion in this case is not possible. But, you can try bringing in entire data to a dataset which is not profile enabled and then use query service to filter required data and send that into a profile enabled dataset. This approach can come in handy if you do not have resources available currently to filter data at source.

 

Hope this helps.

 

Cheers,

Abhinav

1 reply

brekrut
Adobe Employee
Adobe Employee
May 10, 2024

Using the data Prep functions of nullify, get_values, and equals could be uses to nullify a field, but the skipping of records is not really intended to be part to be of the S3 as a dataSource.

 

Can you remove/filter the records from the source which is producing the brand of XYZ records.  

 

Currently the ability to filter out row level data is only present upon the sources of Google Big Query, MS Dynamics, Snowflake, or Salesforce.

https://experienceleague.adobe.com/en/docs/experience-platform/sources/api-tutorials/filter

shubham10Author
Level 2
May 10, 2024

Thank you @brekrut 🙂

abhinavbalooni
Community Advisor
abhinavbalooniCommunity AdvisorAccepted solution
Community Advisor
May 10, 2024

Hey @shubham10 

There is one more approach to it.

 

As @brekrut mentioned, skipping of records during ingestion in this case is not possible. But, you can try bringing in entire data to a dataset which is not profile enabled and then use query service to filter required data and send that into a profile enabled dataset. This approach can come in handy if you do not have resources available currently to filter data at source.

 

Hope this helps.

 

Cheers,

Abhinav