Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
BedrockMission!

Learn more

View all

Sign in to view all badges

SOLVED

External search integration best practices

av-ey
Level 2
Level 2

Hello folks,

 

My understanding has always been that the design pattern to follow when integrating with external search platform is always a push based mechanism whereby AEM responds to content events (publish, unpublish etc.). Traditionally this ends up being calls to the external platform to update its indexes with appropriate content payload.

 

Our external search platform provider has released a beta version of their AEM connector for us to evaluate. It essentially does a full and incremental crawl using the QueryBuilder API. My concern is that it is inevitable that one or more queries will end traversing too many nodes and exceed Oak's read limits threshold. The vendor admits that this is a strong possiblity and their suggestion is to simply increase the the limit.

 

I am not personally convinced a pull approach in this fashion for a connector is the right approach - especially QueryBuilder APIs to be at the core of it. My view is that the orchestration should always happen on the AEM side where it responds to events to hydrate external search platform with the right data.

 

I'd love to hear the community's thoughts and point of views.

 

Regards,

Arup Vidyerthy

AEM integration search
1 Accepted Solution
Arun_Patidar
Correct answer by
Community Advisor
Community Advisor

Please check if below helps, connector vs crawler
https://www.aemconnector.com/sheet.html

 

With JCR queries, you will always run into the problem due to node limit and it will create extra load on the publisher.

The content can be feed to the search based on publish/unpublished events. That will also allow filtering the content and doing extra operations.

View solution in original post

2 Replies
Arun_Patidar
Correct answer by
Community Advisor
Community Advisor

Please check if below helps, connector vs crawler
https://www.aemconnector.com/sheet.html

 

With JCR queries, you will always run into the problem due to node limit and it will create extra load on the publisher.

The content can be feed to the search based on publish/unpublished events. That will also allow filtering the content and doing extra operations.

View solution in original post

av-ey
Level 2
Level 2
Exactly right Arun! I'd be keen to know if there is anyone in the community who is pulling content via QueryBuilder APIs to hydrate external search indexes. Neither have I personally used this approach nor does it make sense to me. I wonder if anyone out there in the community has used this and had success and if so I'd be interested to know volume of content and scale of the integration at play.