This conversation has been locked due to inactivity. Please create a new post.
This conversation has been locked due to inactivity. Please create a new post.
Currently we are trying to index our site (with 40K pages) using a scheduler Job to SOLR server.
I have tried web page scraping using HTMLUnit and Jsoup, but both approaches take 10+s to form the required model object to be sent to SOLR.
I was able to form the model object using ModelExporter (getting jcr:content as JSON) within 1s. This works fine for single page. But when I run using scheduler (which iterates over the pages), it takes 2-3s.
so the full site indexing takes 24 hours.
Does anyone has any idea on how to do this optimally or any AEM server activity which can speed this up ?
Solved! Go to Solution.
Views
Replies
Total Likes
I don't think that there is a faster way to extract this information in a structured way, but you can always run this process in a multi-threaded way. And instead of just thinking of the initial filling of the index, please consider the cases of updates during regular operation.
Hi @Nithyasri_K,
You can use the below links:
https://helpx.adobe.com/experience-manager/using/aem_solr64.html
Hope this helps.
Thanks,
Kiran Vedantam
Hi @Kiran_Vedantam , our old approach (before using model exporter) is from the above links. This took 8s for get the page data. hence moved to model exporter.
I don't think that there is a faster way to extract this information in a structured way, but you can always run this process in a multi-threaded way. And instead of just thinking of the initial filling of the index, please consider the cases of updates during regular operation.
Views
Likes
Replies
Views
Likes
Replies
Views
Likes
Replies