Data Access API - Downloading content of Dataset in csv/json format | Community
Skip to main content
Level 3
February 13, 2023
Solved

Data Access API - Downloading content of Dataset in csv/json format

  • February 13, 2023
  • 5 replies
  • 3932 views

Hi,

Few month back, we've ingested 200K record via batch ingestion. Records were in JSON format. Now we wanted to download the records in json or csv format. We are using following Data access api to download the dataset content... 

{
    "type""DTAC-4040",
    "status"404,
    "title""Not Found",
    "detail""The file not found in ADLS",
    "additionalDetails": {
        "requestId"null
    }

Thanks.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Vikashyadav

Got one workaround to download the dataset content in csv or json format. Not sure if this is standard way but serve my purpose.

I've used Python SDK(DatasetReader), Pandas library in JupyterLab notebook to read & export the content in csv & json format. This was very quick.

from platform_sdk.dataset_reader import DatasetReader
import pandas as pd

dataset_reader = DatasetReader(get_platform_sdk_client_context(), dataset_id="<datasetid>")
df = dataset_reader.limit(200000).read()
df.to_csv("eventData.csv",sep=',',encoding='utf-8',index=False,header=True)
df.to_json("eventData.json",orient='records',lines=True)

For 200K records, it takes 1.5GB memory for read and export operation.

If you've large dataset and limited memory, try using "offset()" with "limit()" to avoid delay or crash.

df = dataset_reader.limit(50000).offset(1).read()

 

Thanks.

5 replies

_Manoj_Kumar_
Community Advisor
Community Advisor
February 14, 2023

Hello @vikashyadav 

 

According to this document. We can get maximum of 100 files in a single API request and there is no other way mentioned.

 

Following this thread, to understand if there is a better way to export the data.

     Manoj     Find me on LinkedIn
October 20, 2023

What type of data can be accessed ? is it possible to download or view customer or PII data?

Anil_Umachigi
Adobe Employee
Adobe Employee
February 15, 2023

Also to add, data will only be available in Parquet format and not really JSON/CSV if you go this route. 

To help troubleshoot, could you help us with the intent? why are you trying to download the data from AEP instead of getting it directly from the source ( where you uploaded it from initially)  

 

 

Level 3
February 15, 2023

Hi @anil_umachigi,

Thanks for your reply. This is an event data that is ingested 4 months back and I believe dataset have some expiry(ttl) on that which cause removal of the dataset from datalake. I did not check expiry for this dataset. I wanted to share the copy of dataset with other teams to do some analysis and I don't have this copy in my system or at the source. So wondering if we can download this in csv or json format for sharing?

Following document mention that we can access and download batch files and I was following the same...

https://experienceleague.adobe.com/docs/experience-platform/data-access/api.html?lang=en

If there is any another way, could you please guide?

Thanks.

VikashyadavAuthorAccepted solution
Level 3
February 15, 2023

Got one workaround to download the dataset content in csv or json format. Not sure if this is standard way but serve my purpose.

I've used Python SDK(DatasetReader), Pandas library in JupyterLab notebook to read & export the content in csv & json format. This was very quick.

from platform_sdk.dataset_reader import DatasetReader
import pandas as pd

dataset_reader = DatasetReader(get_platform_sdk_client_context(), dataset_id="<datasetid>")
df = dataset_reader.limit(200000).read()
df.to_csv("eventData.csv",sep=',',encoding='utf-8',index=False,header=True)
df.to_json("eventData.json",orient='records',lines=True)

For 200K records, it takes 1.5GB memory for read and export operation.

If you've large dataset and limited memory, try using "offset()" with "limit()" to avoid delay or crash.

df = dataset_reader.limit(50000).offset(1).read()

 

Thanks.

October 20, 2023

I have a question regarding - What kind of data is retrievable ? is it possible to download or view customer or PII data and any other information you can provide on type of data and an example will be helpful 

Level 2
February 15, 2023

Create new database, download or export any dataset into CSV format. And upload to new database 

ChetanyaJain-1
Community Advisor
Community Advisor
March 4, 2023