Expand my Community achievements bar.

SOLVED

Sandbox storage size, dataset batch deletion

Avatar

Level 7

Hi there,

 

As part of our data hygiene best practice, we are looking to delete dataset batches that were ingested more than 30 days ago to keep our storage usage optimized and low. I've tried going into each dataset and just create a request via delete batch api to remove those batches, but that's a very manual work. 

 

With the new data life cycle feature, that doesn't my purpose as I don't want to delete the entire dataset. What are some recommendations?

 

Thanks

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

@akwankl just a thought but I believe there is a way for you to list the successfully ingested batches in a certain timeframe. We know we had one customer using the following in the past few months, using the Catalog Service API:

 

GET https://platform.adobe.io/data/foundation/catalog/batches?dataSet={dataset_id}&status[…]createdAfter...

 

The next step is then an 'API Batch delete' that is documented in https://experienceleague.adobe.com/docs/experience-platform/ingestion/batch/api-overview.html?lang=e...

This effectively marks the batches with a flag for the garbage collector to then delete.

 

I'm guessing you should be able to to so with multiple batches if you build the correct request body from your API client.

View solution in original post

4 Replies

Avatar

Community Advisor

Hi @akwankl  -

 

What kind of datasets are they? If they are experience event based datasets and are enabled for realtime. One recommendation could be to Get a TTL set on the events that are 30 days old. Adobe will take care of deleting them.

Here is the documentation https://experienceleague.adobe.com/docs/experience-platform/profile/event-expirations.html?lang=en

 

I haven't personally worked on Data Hygiene so doesn't have much info about it.

 

Thanks,

Arpan

 

Avatar

Level 7

They are individual profile attributes, we are currently doing a daily full snapshot import, so we would only need the latest batch. 

Avatar

Correct answer by
Employee Advisor

@akwankl just a thought but I believe there is a way for you to list the successfully ingested batches in a certain timeframe. We know we had one customer using the following in the past few months, using the Catalog Service API:

 

GET https://platform.adobe.io/data/foundation/catalog/batches?dataSet={dataset_id}&status[…]createdAfter...

 

The next step is then an 'API Batch delete' that is documented in https://experienceleague.adobe.com/docs/experience-platform/ingestion/batch/api-overview.html?lang=e...

This effectively marks the batches with a flag for the garbage collector to then delete.

 

I'm guessing you should be able to to so with multiple batches if you build the correct request body from your API client.

Avatar

Level 7

Hey @Tof_Jossic, yeah that's the conclusion I've reached to. Was hoping there would be some out of the box features that would do it, and some handy scripts out there that could be shared. 

 

Thank you!