Hi,
Few month back, we've ingested 200K record via batch ingestion. Records were in JSON format. Now we wanted to download the records in json or csv format. We are using following Data access api to download the dataset content...
Thanks.
해결되었습니다! 솔루션으로 이동.
토픽은 커뮤니티 콘텐츠를 분류하여 관련성 있는 콘텐츠를 찾는 데 도움이 됩니다.
Got one workaround to download the dataset content in csv or json format. Not sure if this is standard way but serve my purpose.
I've used Python SDK(DatasetReader), Pandas library in JupyterLab notebook to read & export the content in csv & json format. This was very quick.
from platform_sdk.dataset_reader import DatasetReader
import pandas as pd
dataset_reader = DatasetReader(get_platform_sdk_client_context(), dataset_id="<datasetid>")
df = dataset_reader.limit(200000).read()
df.to_csv("eventData.csv",sep=',',encoding='utf-8',index=False,header=True)
df.to_json("eventData.json",orient='records',lines=True)
For 200K records, it takes 1.5GB memory for read and export operation.
If you've large dataset and limited memory, try using "offset()" with "limit()" to avoid delay or crash.
df = dataset_reader.limit(50000).offset(1).read()
Thanks.
What type of data can be accessed ? is it possible to download or view customer or PII data?
조회 수
답글
좋아요 수
Also to add, data will only be available in Parquet format and not really JSON/CSV if you go this route.
To help troubleshoot, could you help us with the intent? why are you trying to download the data from AEP instead of getting it directly from the source ( where you uploaded it from initially)
조회 수
답글
좋아요 수
Hi @Anil_Umachigi,
Thanks for your reply. This is an event data that is ingested 4 months back and I believe dataset have some expiry(ttl) on that which cause removal of the dataset from datalake. I did not check expiry for this dataset. I wanted to share the copy of dataset with other teams to do some analysis and I don't have this copy in my system or at the source. So wondering if we can download this in csv or json format for sharing?
Following document mention that we can access and download batch files and I was following the same...
https://experienceleague.adobe.com/docs/experience-platform/data-access/api.html?lang=en
If there is any another way, could you please guide?
Thanks.
조회 수
답글
좋아요 수
Got one workaround to download the dataset content in csv or json format. Not sure if this is standard way but serve my purpose.
I've used Python SDK(DatasetReader), Pandas library in JupyterLab notebook to read & export the content in csv & json format. This was very quick.
from platform_sdk.dataset_reader import DatasetReader
import pandas as pd
dataset_reader = DatasetReader(get_platform_sdk_client_context(), dataset_id="<datasetid>")
df = dataset_reader.limit(200000).read()
df.to_csv("eventData.csv",sep=',',encoding='utf-8',index=False,header=True)
df.to_json("eventData.json",orient='records',lines=True)
For 200K records, it takes 1.5GB memory for read and export operation.
If you've large dataset and limited memory, try using "offset()" with "limit()" to avoid delay or crash.
df = dataset_reader.limit(50000).offset(1).read()
Thanks.
I have a question regarding - What kind of data is retrievable ? is it possible to download or view customer or PII data and any other information you can provide on type of data and an example will be helpful
조회 수
답글
좋아요 수
Create new database, download or export any dataset into CSV format. And upload to new database
조회 수
답글
좋아요 수
Hi @vikash4 ,
You can export the datasets using the new SFTP connector (beta) - https://experienceleague.adobe.com/docs/experience-platform/destinations/catalog/cloud-storage/sftp.... in csv/json format easily.
Another reference: https://experienceleague.adobe.com/docs/experience-platform/destinations/ui/activate/export-datasets...
Regards,
Chetanya Jain