Expand my Community achievements bar.

SOLVED

RTCDP SFTP Source Connector - Ingest only net new data

Avatar

Level 4

Hello, I am wondering if there is a setting in the SFTP source connector to enable only net new records to be ingested. Im noticing that every time the data flows runs, it ingests the old records as well as any new records, which will inevitably lead to millions of records to be ingested over the long run. Is this a setting in RTCDP or the source (SFMC).

1 Accepted Solution

Avatar

Correct answer by
Employee

Hi @RyanMoravick,

 

In the process of overwriting the data, the last modified time of the file changes and the system considers it as a new file.

The system is designed in such a way that it checks the last execution time and any file placed after its last execution time will be picked.

 

To overcome this problem, limit your export to run once daily.

If you cannot limit the writing process, write the file to a staging folder during the entire day and copy the file to the scheduled folder once daily.

 

Regards,
Kumar Saurabh

View solution in original post

11 Replies

Avatar

Employee

Hello @RyanMoravick 

 

The SFTP source connector can ingest data from a file location and will process records based upon the timestamp of the file.  If you create a dataflow which is mapped to a folder only new files will be ingested by the data flow into the Adobe RTCDP platform.

 

I can see you have indicated you are using SFTP and ingesting from SFMC.  Just to confirm are you exporting from SFMC into an SFTP location and then ingesting to Adobe RTCDP?

Avatar

Level 4

Hey @brekrut correct I am exporting from SFMC into the SFTP location and ingesting to Adobe RTCDP. For additional context, the screenshot is showing ingestion of email performance data from SFMC into Adobe CDP. You can see every hour it is ingesting the same 38 profiles. Ideally I would only want the data flow to ingest new records only.

Avatar

Employee

I can see based upon the screenshot the dataflow is running every hour and picking up 38 records. With the data which is being exported out of SFMC is this writing to a new file or updating an existing file on the SFTP location?

Avatar

Level 4

It is overwriting the current data in the data extension and then being extracted to the same file on the SFTP location. 

Avatar

Correct answer by
Employee

Hi @RyanMoravick,

 

In the process of overwriting the data, the last modified time of the file changes and the system considers it as a new file.

The system is designed in such a way that it checks the last execution time and any file placed after its last execution time will be picked.

 

To overcome this problem, limit your export to run once daily.

If you cannot limit the writing process, write the file to a staging folder during the entire day and copy the file to the scheduled folder once daily.

 

Regards,
Kumar Saurabh

Avatar

Employee

@RyanMoravick  to add on to @Kumar_Saurabh_ post.

 

The SFTP source connector process will ingest files based upon the timestamp of the file placed into the source path.  If you are overwriting the ingestion file with new data, Adobe RTCDP will observe this as a new file and ingest all of the data in the file.  Adobe RTCDP ingest all records in the file a new records. 

 

To ingest incremental data from SFMC I would recommend creating a new file upon the ingestion path.

Avatar

Level 4

@brekrut would updating the DE in SFMC instead of overwriting be the better option here. Does that affect the timestamp the same way as Overwrite? Ive dropped a screenshot below for reference.

Avatar

Employee

Updating might be the better option here on the data extension.

 

At the end of the data you are looking to create a new file with only the records to be imported into Adobe RTCDP.

Avatar

Level 4

@brekrut is there not a incremental data load option for the SFTP source connector like there is for the CRM or Marketing source connectors?

Avatar

Level 4

@brekrut To ingest incremental data from SFMC I would recommend creating a new file upon the ingestion path.

 

If a new file is created upon the ingestion path, won't RTCDP see that as a new file and ingest it anyway?

Avatar

Employee

Hello @RyanMoravick 

 

Please review the following page.

https://experienceleague.adobe.com/en/docs/experience-platform/sources/ui-tutorials/dataflow/cloud-s...

 

As noted on the page above:

 

For batch ingestion, every ensuing dataflow selects files to be ingested from your source based on their last modified timestamp. This means that batch dataflows select files from the source that are either new or have been modified since the last flow run. Furthermore, you must ensure that there’s a sufficient time span between file upload and a scheduled flow run because files that are not entirely uploaded to your cloud storage account before the scheduled flow run time may not be picked up for ingestion.