Hello guys,
I have the following situation here:
I'm trying to obtain the content of each page, in order to index it to solr documents.This will be used later, so when searching on the site for a word, the client will get the page which contains the text inside page body, for example.
Now, the problem is that, while reindexing the content to solr (which means reading page content), I'm getting a certain desynchronization between solr and what the page is actually containing, more exactly: multiple solr documents, contains the same content.
Even more details:
-> for this implementation I used : SlingRequestProcessor and SlingInternalRequest and they do not have any possibility for synchronization (like wait for response) - SlingInternalRequest seems offer more options for synchronization, but is just an appearance.
-> the actual result looks something like this:
My questions are:
Solved! Go to Solution.
Views
Replies
Total Likes
Yes, is the second case.my question is not about the mechanism, but how to obtain the page content, in an iterative manner (if you have 100 pages, you have to make 100 http request to the instance), safely.
it seems that using SlingRequestProcessor and SlingInternalRequest does not work properly, because they make the request in hit and run manner.
So the solution that I found is make external requests using java.net.http.HttpClient.
It offers method that waits for the answer: https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/HttpClient.html#send(...)
Views
Replies
Total Likes
@dariuspuscas are you using solr as a replacement to AEM internal Lucene Index or as external integrated search server for web search?
If second case, you need to generate feed from aem on a regular basis could be a daily feed and in that feed xml populate all the required data from the page be page title, description, content etc.. and then use this feedxml to import to solr daily..
Views
Replies
Total Likes
Yes, is the second case.my question is not about the mechanism, but how to obtain the page content, in an iterative manner (if you have 100 pages, you have to make 100 http request to the instance), safely.
it seems that using SlingRequestProcessor and SlingInternalRequest does not work properly, because they make the request in hit and run manner.
So the solution that I found is make external requests using java.net.http.HttpClient.
It offers method that waits for the answer: https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/HttpClient.html#send(...)
Views
Replies
Total Likes
Views
Likes
Replies
Views
Likes
Replies