Expand my Community achievements bar.

Data Distiller | A Guide to Data Export: Exporting Derived Datasets to the Data Landing Zone

Avatar

Employee

10/8/24

In this blog, we'll walk you through the process of exporting derived datasets created in Data Distiller for various enterprise use cases, such as AI/ML and enterprise reporting. Using the built-in data export capabilities, you'll be able to effortlessly move datasets to the supported destinations via the Data Distiller SKU. This detailed guide is designed to simplify the entire export process, ensuring smooth and efficient data transfer.

Step 1: Introduction to Data Export

Before starting, ensure that your data is stored in the Data Landing Zone. Access to this zone requires REST APIs and Postman for setup.

In this tutorial, we'll walk through the process of exporting the derived dataset to one of the Cloud Storage locations supported Data Landing Zone.

Step 2: Navigating the Export Process

  1. Open Destinations
    • Navigate to the Destinations section on the left-hand menu.
    • Ensure you are on the Catalog tab.
  2. Filter the Data Types
    • Under Data Types, select Datasets only.
    • In the Cloud Storage options, locate and select the Data Landing Zone.
    • Click on Activate.
    • Figure: Destinations – Navigate to the “Data Landing Zone”._Sameeksha__12-1728405582294.png
  3. Choose the Pre-Configured Setup
    • You will now see a Data Landing Zone destination with a pre-configured setup.
    • Select the destination named Demo_Data_Export and click Next.
    • Figure: Destinations – Select Pre-configured Destination._Sameeksha__0-1728406069096.png
  4. Select the Derived Dataset
    • From the list of derived datasets, select the one you created previously, such as adls_rfm_profile, and click Next.
    • Figure: Destinations – Select Derived Dataset_Sameeksha__10-1728405538757.png
  5. Set Export Schedule
    • Set up a schedule for exporting the data. In steps 1 to 4, define the export cadence, including frequency, time, and start date.
    • Once the schedule is set, click Next.
    • Figure: Destinations – Define Schedule_Sameeksha__9-1728405476256.png
  6. Review and Finalize
    • Review the export details on the summary page.
    • Click Finish to complete the setup.
    • Figure: Destinations – Review & Finish_Sameeksha__8-1728405456592.pngFigure: Destinations – Dataset Export Setup Complete_Sameeksha__7-1728405437641.png

Some of Key Features and Capabilities of Data Export

Here are some notable features that enhance the flexibility and efficiency of the data export process:

  1. Access
    • This feature can be accessed via the Destination UI in the Adobe Experience PlatformTo be able to access “Destinations”, the prerequisite is for you to have the required licensing.
  2. Supported Cloud Storage Destinations
    • Amazon S3
    • Google Cloud Storage
    • Azure Data Lake Storage Gen 2
    • Azure Blob Storage
    • Data Landing Zone
    • FTP
  3. Incremental Export
    • The first export will be a full export of the dataset.
    • Subsequent exports will transfer incremental changes only, improving efficiency.
  4. Customizable Scheduled Frequency
    • You can set the data export to occur at regular intervals, such as: every 3, 6, 8, 12, or 24 hours (or daily).
  5. Supported Output Formats
    • JSON
    • Parquet
  6. Data Export Limits
    • Event data: Can export a maximum of the last 365 days.
    • Attributes: Can handle up to 10 billion records across all datasets in a single export flow.
  7. DULE Enforcement
    • Ensure that Derived Datasets in Data Distiller are manually labeled​.

Conclusion

This demo is designed to provide a foundation for exporting data. By following this guide, you’ll be able to seamlessly export derived datasets from Data Distiller to the Data Landing Zone or other supported cloud storage destinations. Whether for AI/ML applications or enterprise reporting, this export functionality provides the flexibility and efficiency required for modern data workflows.

Author: @_Sameeksha_