External search integration best practices

Avatar

Avatar
Validate 1
Level 2
av-ey
Level 2

Likes

2 likes

Total Posts

18 posts

Correct reply

0 solutions
Top badges earned
Validate 1
Ignite 3
Ignite 1
Give Back 5
Give Back 3
View profile

Avatar
Validate 1
Level 2
av-ey
Level 2

Likes

2 likes

Total Posts

18 posts

Correct reply

0 solutions
Top badges earned
Validate 1
Ignite 3
Ignite 1
Give Back 5
Give Back 3
View profile
av-ey
Level 2

26-05-2021

Hello folks,

 

My understanding has always been that the design pattern to follow when integrating with external search platform is always a push based mechanism whereby AEM responds to content events (publish, unpublish etc.). Traditionally this ends up being calls to the external platform to update its indexes with appropriate content payload.

 

Our external search platform provider has released a beta version of their AEM connector for us to evaluate. It essentially does a full and incremental crawl using the QueryBuilder API. My concern is that it is inevitable that one or more queries will end traversing too many nodes and exceed Oak's read limits threshold. The vendor admits that this is a strong possiblity and their suggestion is to simply increase the the limit.

 

I am not personally convinced a pull approach in this fashion for a connector is the right approach - especially QueryBuilder APIs to be at the core of it. My view is that the orchestration should always happen on the AEM side where it responds to events to hydrate external search platform with the right data.

 

I'd love to hear the community's thoughts and point of views.

 

Regards,

Arup Vidyerthy

View Entire Topic

Avatar

Avatar
Coach
MVP
Arun_Patidar
MVP

Likes

1,462 likes

Total Posts

3,328 posts

Correct reply

949 solutions
Top badges earned
Coach
Contributor 2
Ignite 10
Give Back 700
Boost 1000
View profile

Avatar
Coach
MVP
Arun_Patidar
MVP

Likes

1,462 likes

Total Posts

3,328 posts

Correct reply

949 solutions
Top badges earned
Coach
Contributor 2
Ignite 10
Give Back 700
Boost 1000
View profile
Arun_Patidar
MVP

26-05-2021

Please check if below helps, connector vs crawler
https://www.aemconnector.com/sheet.html

 

With JCR queries, you will always run into the problem due to node limit and it will create extra load on the publisher.

The content can be feed to the search based on publish/unpublished events. That will also allow filtering the content and doing extra operations.