Data Access API - Downloading content of Dataset in csv/json format
Hi,
Few month back, we've ingested 200K record via batch ingestion. Records were in JSON format. Now we wanted to download the records in json or csv format. We are using following Data access api to download the dataset content...
- Get the batches associated with Dataset - https://platform.adobe.io/data/foundation/catalog/batches?dataSet=<datase-id>
"metrics": {"partitionCount": 382,"inputFileCount": 2,"outputRecordCount": 200000,"outputFileCount": 382,"inputRecordCount": 200000}Response : Return one batch id & metrics param return 382 as output file count
- Get the files associated with the batch-id return in first step - https://platform.adobe.io/data/foundation/export/batches/<batch-id>/files?start=0&limit=100 . Since total output file is 382, I've executed this api four times with different "start" param values(0,100,200,300). Response : dataSetFileId, dataSetViewId & link to access the contents of dataset file. Example link : https://platform.adobe.io:443/data/foundation/export/files/<dataSetFileId>
Q1 : How can I get file details for all 382 files with one request? I mean, I don't want to reset "start" param each time. Is there any way to achieve this? - Q2 : Do I need to call "/export/files" api individually for all 382 files to get the file path and access the content? Access the file path : https://platform.adobe.io:443/data/foundation/export/files/<dataSetFileId> Access the content : https://platform.adobe.io:443/data/foundation/export/files/<dataSetFileId>/?path=<FilePath>.parquet
- Q3 : When I change the extension of path param value from .parquet to .csv, it return 404 response https://platform.adobe.io:443/data/foundation/export/files/<dataSetFileId>/?path=<FilePath>.csv
- This api https://platform.adobe.io:443/data/foundation/export/files/<dataSetFileId>/?path=<FilePath>.parquet shows unreadable content in POSTMAN in .parquet format. How we can download these content in csv/json format in separate file as a response.
Thanks.
