My understanding has always been that the design pattern to follow when integrating with external search platform is always a push based mechanism whereby AEM responds to content events (publish, unpublish etc.). Traditionally this ends up being calls to the external platform to update its indexes with appropriate content payload.
Our external search platform provider has released a beta version of their AEM connector for us to evaluate. It essentially does a full and incremental crawl using the QueryBuilder API. My concern is that it is inevitable that one or more queries will end traversing too many nodes and exceed Oak's read limits threshold. The vendor admits that this is a strong possiblity and their suggestion is to simply increase the the limit.
I am not personally convinced a pull approach in this fashion for a connector is the right approach - especially QueryBuilder APIs to be at the core of it. My view is that the orchestration should always happen on the AEM side where it responds to events to hydrate external search platform with the right data.
I'd love to hear the community's thoughts and point of views.