Expand my Community achievements bar.

Dive into Adobe Summit 2024! Explore curated list of AEM sessions & labs, register, connect with experts, ask questions, engage, and share insights. Don't miss the excitement.
SOLVED

External search integration best practices

Avatar

Level 2

Hello folks,

 

My understanding has always been that the design pattern to follow when integrating with external search platform is always a push based mechanism whereby AEM responds to content events (publish, unpublish etc.). Traditionally this ends up being calls to the external platform to update its indexes with appropriate content payload.

 

Our external search platform provider has released a beta version of their AEM connector for us to evaluate. It essentially does a full and incremental crawl using the QueryBuilder API. My concern is that it is inevitable that one or more queries will end traversing too many nodes and exceed Oak's read limits threshold. The vendor admits that this is a strong possiblity and their suggestion is to simply increase the the limit.

 

I am not personally convinced a pull approach in this fashion for a connector is the right approach - especially QueryBuilder APIs to be at the core of it. My view is that the orchestration should always happen on the AEM side where it responds to events to hydrate external search platform with the right data.

 

I'd love to hear the community's thoughts and point of views.

 

Regards,

Arup Vidyerthy

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Please check if below helps, connector vs crawler
https://www.aemconnector.com/sheet.html

 

With JCR queries, you will always run into the problem due to node limit and it will create extra load on the publisher.

The content can be feed to the search based on publish/unpublished events. That will also allow filtering the content and doing extra operations.



Arun Patidar

View solution in original post

2 Replies

Avatar

Correct answer by
Community Advisor

Please check if below helps, connector vs crawler
https://www.aemconnector.com/sheet.html

 

With JCR queries, you will always run into the problem due to node limit and it will create extra load on the publisher.

The content can be feed to the search based on publish/unpublished events. That will also allow filtering the content and doing extra operations.



Arun Patidar

Avatar

Level 2
Exactly right Arun! I'd be keen to know if there is anyone in the community who is pulling content via QueryBuilder APIs to hydrate external search indexes. Neither have I personally used this approach nor does it make sense to me. I wonder if anyone out there in the community has used this and had success and if so I'd be interested to know volume of content and scale of the integration at play.