Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
BedrockMission!

Learn more

View all

Sign in to view all badges

AEP Dataset Indexes (via Parquet ColumnIndex and OffsetIndex)

Avatar

Avatar
Bedrock
Level 5
jkm-disco
Level 5

Likes

26 likes

Total Posts

134 posts

Correct reply

15 solutions
Top badges earned
Bedrock
Contributor 2
Seeker
Springboard
Affirm 10
View profile

Avatar
Bedrock
Level 5
jkm-disco
Level 5

Likes

26 likes

Total Posts

134 posts

Correct reply

15 solutions
Top badges earned
Bedrock
Contributor 2
Seeker
Springboard
Affirm 10
View profile
jkm-disco
Level 5

30-08-2021

Description - The ability to create custom index(es) on AEP datasets for better query service performance. Not sure if Azure Data Lake Service accomodates the encoding, but it looks like there could be serious performance gains by defining a ColumnIndex and OffsetIndex, reference.

Why is this feature important to you - Many users are interested in using Query Service, but the current UI can't conveniently handle larger requests, and even directly using ODBCs can be time consuming for larger datasets. Though there is native partition elimination and other benefits of using column-oriented storage, it would be extremely useful if clients could define a custom index improving performance. Even if only a single index per dataset, it could be a very helpful feature.

How would you like the feature to work - It could be added to the UI of the data set with a simple checkbox/numbering selection for composite indexes with a specific order. Maybe a dialogue pop-up warning of any extra storage-related costs to hosting an index and an etimated time for building the index.

Current Behaviour - There is only partition elimination, but no indexing.

AEP parquet Query Service