Expand my Community achievements bar.

SOLVED

Data Access API - Downloading content of Dataset in csv/json format

Avatar

Level 3

Hi,

Few month back, we've ingested 200K record via batch ingestion. Records were in JSON format. Now we wanted to download the records in json or csv format. We are using following Data access api to download the dataset content... 

{
    "type""DTAC-4040",
    "status"404,
    "title""Not Found",
    "detail""The file not found in ADLS",
    "additionalDetails": {
        "requestId"null
    }

Thanks.

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Level 3

Got one workaround to download the dataset content in csv or json format. Not sure if this is standard way but serve my purpose.

I've used Python SDK(DatasetReader), Pandas library in JupyterLab notebook to read & export the content in csv & json format. This was very quick.

from platform_sdk.dataset_reader import DatasetReader
import pandas as pd

dataset_reader = DatasetReader(get_platform_sdk_client_context(), dataset_id="<datasetid>")
df = dataset_reader.limit(200000).read()
df.to_csv("eventData.csv",sep=',',encoding='utf-8',index=False,header=True)
df.to_json("eventData.json",orient='records',lines=True)

vikash4_0-1676475721110.png

For 200K records, it takes 1.5GB memory for read and export operation.

If you've large dataset and limited memory, try using "offset()" with "limit()" to avoid delay or crash.

df = dataset_reader.limit(50000).offset(1).read()

 

Thanks.

View solution in original post

8 Replies

Avatar

Community Advisor

Hello @vikash4 

 

According to this document. We can get maximum of 100 files in a single API request and there is no other way mentioned.

 

Following this thread, to understand if there is a better way to export the data.


     Manoj
     Find me on LinkedIn

Avatar

Level 1

What type of data can be accessed ? is it possible to download or view customer or PII data?

Avatar

Community Advisor and Adobe Champion

Also to add, data will only be available in Parquet format and not really JSON/CSV if you go this route. 

To help troubleshoot, could you help us with the intent? why are you trying to download the data from AEP instead of getting it directly from the source ( where you uploaded it from initially)  

 

 

Avatar

Level 3

Hi @Anil_Umachigi,

Thanks for your reply. This is an event data that is ingested 4 months back and I believe dataset have some expiry(ttl) on that which cause removal of the dataset from datalake. I did not check expiry for this dataset. I wanted to share the copy of dataset with other teams to do some analysis and I don't have this copy in my system or at the source. So wondering if we can download this in csv or json format for sharing?

Following document mention that we can access and download batch files and I was following the same...

https://experienceleague.adobe.com/docs/experience-platform/data-access/api.html?lang=en

If there is any another way, could you please guide?

Thanks.

Avatar

Correct answer by
Level 3

Got one workaround to download the dataset content in csv or json format. Not sure if this is standard way but serve my purpose.

I've used Python SDK(DatasetReader), Pandas library in JupyterLab notebook to read & export the content in csv & json format. This was very quick.

from platform_sdk.dataset_reader import DatasetReader
import pandas as pd

dataset_reader = DatasetReader(get_platform_sdk_client_context(), dataset_id="<datasetid>")
df = dataset_reader.limit(200000).read()
df.to_csv("eventData.csv",sep=',',encoding='utf-8',index=False,header=True)
df.to_json("eventData.json",orient='records',lines=True)

vikash4_0-1676475721110.png

For 200K records, it takes 1.5GB memory for read and export operation.

If you've large dataset and limited memory, try using "offset()" with "limit()" to avoid delay or crash.

df = dataset_reader.limit(50000).offset(1).read()

 

Thanks.

Avatar

Level 1

I have a question regarding - What kind of data is retrievable ? is it possible to download or view customer or PII data and any other information you can provide on type of data and an example will be helpful 

Avatar

Level 2

Create new database, download or export any dataset into CSV format. And upload to new database