Indexing the whole site to SOLR
Currently we are trying to index our site (with 40K pages) using a scheduler Job to SOLR server.
I have tried web page scraping using HTMLUnit and Jsoup, but both approaches take 10+s to form the required model object to be sent to SOLR.
I was able to form the model object using ModelExporter (getting jcr:content as JSON) within 1s. This works fine for single page. But when I run using scheduler (which iterates over the pages), it takes 2-3s.
so the full site indexing takes 24 hours.
Does anyone has any idea on how to do this optimally or any AEM server activity which can speed this up ?