Adobe Analytics

Data Warehouse - different format option than just CSV files

Community Advisor and Adobe Champion

12/7/23

One suggestion that I would love to make to the Adobe team is - can we ask them to offer other data format options other than just CSV files? If the file format could be a modern file format like parquet, then that would also make our processing much more efficient.

When we get the Adobe files, the first thing we have to do is pre-process them to format the files, because of the way they are structured. This slows down the entire process by a lot, and is also expensive.

In brief:

The format of the CSV files that are provided are not "splittable" to be able to be processed by a big data tool like Spark or Databricks. The reason is that there are new-line characters embedded in some of fields, and CSV files have to be processed serially when that is the case. This is an inherent limitation of the design of the CSV format. As a result, these huge files, which are multiple GB in size, have to be processed initially using a single machine, instead of leveraging the power of parallel machine design such as Spark. This limits our ability to quickly process the data.

Switching to a format such as parquet, would mean that the data would not only be able to be processed in parallel, but would also mean that it would automatically be compressed, saving on both storage and compute costs to process the files.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Related Conversations

Data Warehouse - easily search for original source 'scheduled' file

119

Allow Different Aggregations for Columns in Workspace Freeform Tables

113

Dynamic Updates of Percent change columns as data updates as well as basis point option for comparison

275

Bulk event naming mapping - CJA-Data View-Components

179

Data Feed enhancement

183

Adobe Analytics

Data Warehouse - different format option than just CSV files

Learn

Documentation

Community

Support

Resources

Adobe account

Adobe