Issues while getting page content
Hello guys,
I have the following situation here:
I'm trying to obtain the content of each page, in order to index it to solr documents.This will be used later, so when searching on the site for a word, the client will get the page which contains the text inside page body, for example.
Now, the problem is that, while reindexing the content to solr (which means reading page content), I'm getting a certain desynchronization between solr and what the page is actually containing, more exactly: multiple solr documents, contains the same content.
Even more details:
-> for this implementation I used : SlingRequestProcessor and SlingInternalRequest and they do not have any possibility for synchronization (like wait for response) - SlingInternalRequest seems offer more options for synchronization, but is just an appearance.
-> the actual result looks something like this:
- the result (page content) coming from the request is used for multiple solr documents, instead of each request with it's own content, to be mapped to appropriate solr document
My questions are:
- What are you using guys, for reading page content of a page (html format) ?
- Are there other implementations (Sling) which offer possibility of synchronization, or wait for response ?