Expand my Community achievements bar.

AEP Dataset Indexes (via Parquet ColumnIndex and OffsetIndex)

Avatar

Level 6

8/30/21

Description - The ability to create custom index(es) on AEP datasets for better query service performance. Not sure if Azure Data Lake Service accomodates the encoding, but it looks like there could be serious performance gains by defining a ColumnIndex and OffsetIndex, reference.

Why is this feature important to you - Many users are interested in using Query Service, but the current UI can't conveniently handle larger requests, and even directly using ODBCs can be time consuming for larger datasets. Though there is native partition elimination and other benefits of using column-oriented storage, it would be extremely useful if clients could define a custom index improving performance. Even if only a single index per dataset, it could be a very helpful feature.

How would you like the feature to work - It could be added to the UI of the data set with a simple checkbox/numbering selection for composite indexes with a specific order. Maybe a dialogue pop-up warning of any extra storage-related costs to hosting an index and an etimated time for building the index.

Current Behaviour - There is only partition elimination, but no indexing.