Hi,
Few month back, we've ingested 200K record via batch ingestion. Records were in JSON format. Now we wanted to download the records in json or csv format. We are using following Data access api to download the dataset content...
Thanks.
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Got one workaround to download the dataset content in csv or json format. Not sure if this is standard way but serve my purpose.
I've used Python SDK(DatasetReader), Pandas library in JupyterLab notebook to read & export the content in csv & json format. This was very quick.
from platform_sdk.dataset_reader import DatasetReader
import pandas as pd
dataset_reader = DatasetReader(get_platform_sdk_client_context(), dataset_id="<datasetid>")
df = dataset_reader.limit(200000).read()
df.to_csv("eventData.csv",sep=',',encoding='utf-8',index=False,header=True)
df.to_json("eventData.json",orient='records',lines=True)
For 200K records, it takes 1.5GB memory for read and export operation.
If you've large dataset and limited memory, try using "offset()" with "limit()" to avoid delay or crash.
df = dataset_reader.limit(50000).offset(1).read()
Thanks.
What type of data can be accessed ? is it possible to download or view customer or PII data?
Views
Replies
Total Likes
Also to add, data will only be available in Parquet format and not really JSON/CSV if you go this route.
To help troubleshoot, could you help us with the intent? why are you trying to download the data from AEP instead of getting it directly from the source ( where you uploaded it from initially)
Views
Replies
Total Likes
Hi @Anil_Umachigi,
Thanks for your reply. This is an event data that is ingested 4 months back and I believe dataset have some expiry(ttl) on that which cause removal of the dataset from datalake. I did not check expiry for this dataset. I wanted to share the copy of dataset with other teams to do some analysis and I don't have this copy in my system or at the source. So wondering if we can download this in csv or json format for sharing?
Following document mention that we can access and download batch files and I was following the same...
https://experienceleague.adobe.com/docs/experience-platform/data-access/api.html?lang=en
If there is any another way, could you please guide?
Thanks.
Views
Replies
Total Likes
Got one workaround to download the dataset content in csv or json format. Not sure if this is standard way but serve my purpose.
I've used Python SDK(DatasetReader), Pandas library in JupyterLab notebook to read & export the content in csv & json format. This was very quick.
from platform_sdk.dataset_reader import DatasetReader
import pandas as pd
dataset_reader = DatasetReader(get_platform_sdk_client_context(), dataset_id="<datasetid>")
df = dataset_reader.limit(200000).read()
df.to_csv("eventData.csv",sep=',',encoding='utf-8',index=False,header=True)
df.to_json("eventData.json",orient='records',lines=True)
For 200K records, it takes 1.5GB memory for read and export operation.
If you've large dataset and limited memory, try using "offset()" with "limit()" to avoid delay or crash.
df = dataset_reader.limit(50000).offset(1).read()
Thanks.
I have a question regarding - What kind of data is retrievable ? is it possible to download or view customer or PII data and any other information you can provide on type of data and an example will be helpful
Views
Replies
Total Likes
Create new database, download or export any dataset into CSV format. And upload to new database
Views
Replies
Total Likes
Hi @vikash4 ,
You can export the datasets using the new SFTP connector (beta) - https://experienceleague.adobe.com/docs/experience-platform/destinations/catalog/cloud-storage/sftp.... in csv/json format easily.
Another reference: https://experienceleague.adobe.com/docs/experience-platform/destinations/ui/activate/export-datasets...
Regards,
Chetanya Jain