Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
Bedrock Mission!

Learn more

View all

Sign in to view all badges

Pages are not getting indexed while crawling using apache nutch


Level 2



I am trying to index pages using apache nutch, but the pages having content from external api are not getting indexed. Can someone help me how to resolve this issue.


Thanks in advance.

1 Reply


Level 4

Hi There,


Just to understand a little bit more here, are you saying that external api content is not getting loaded and therefore not available for crawling. And external content api is not getting loaded because the event to load that content is not happening with while crawling is happening.


If you are crawling the AEM site for creating search indexes, I would recommend the accepted pattern wherein AEM can push the content to indexer as part of publish replication agenet. This will also help you to sanitize and clean content before sending for indexing purpose. In-fact this is an accepted design solution for use cases where you need to keep AEM content outside in some other systems like Solr etc.


Hope it helps!