Currently we are trying to index our site (with 40K pages) using a scheduler Job to SOLR server.
I have tried web page scraping using HTMLUnit and Jsoup, but both approaches take 10+s to form the required model object to be sent to SOLR.
I was able to form the model object using ModelExporter (getting jcr:content as JSON) within 1s. This works fine for single page. But when I run using scheduler (which iterates over the pages), it takes 2-3s.
so the full site indexing takes 24 hours.
Does anyone has any idea on how to do this optimally or any AEM server activity which can speed this up ?
Solved! Go to Solution.
Views
Replies
Total Likes
I don't think that there is a faster way to extract this information in a structured way, but you can always run this process in a multi-threaded way. And instead of just thinking of the initial filling of the index, please consider the cases of updates during regular operation.
Hi @Nithyasri_K,
You can use the below links:
https://helpx.adobe.com/experience-manager/using/aem_solr64.html
Hope this helps.
Thanks,
Kiran Vedantam
Hi @Kiran_Vedantam , our old approach (before using model exporter) is from the above links. This took 8s for get the page data. hence moved to model exporter.
I don't think that there is a faster way to extract this information in a structured way, but you can always run this process in a multi-threaded way. And instead of just thinking of the initial filling of the index, please consider the cases of updates during regular operation.
Views
Likes
Replies
Views
Likes
Replies