AEM (Adobe Experience Manager) won't index PDF content in search results? | Community
Skip to main content
Level 2
February 18, 2022
Solved

AEM (Adobe Experience Manager) won't index PDF content in search results?

  • February 18, 2022
  • 1 reply
  • 1009 views

My employer has recently switched its CMS to AEM (Adobe Experience Manager).

We store a large amount of documentation on our public website.
Our customers need to be able to find information contained within those documents, some of which are 100s pages in length.

Adobe are disappointingly saying their search tool will not search PDFs. Is there any format for producing or saving pdfs that allow the content be indexed?

Is there any workaround/api's available to fix is this?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Nitin_laad

@dwazirl 

You can explore Apache Tika, for text extraction and indexing.
If content search in main feature of site, will recommend you to go for external search engine SOLR or Elastic Search.

Solr is an enterprise grade, secure, highly scalable,  open-source NoSQL search platform from the Apache Lucene

Ref.

https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/aem-and-solr-6-reasons-to-use-managed-solr-service-aem-community/m-p/381197#M27349

https://myaemlearnings.blogspot.com/2020/06/apache-tika-config-in-lucene-index-and.html

1 reply

Nitin_laad
Community Advisor
Nitin_laadCommunity AdvisorAccepted solution
Community Advisor
February 18, 2022

@dwazirl 

You can explore Apache Tika, for text extraction and indexing.
If content search in main feature of site, will recommend you to go for external search engine SOLR or Elastic Search.

Solr is an enterprise grade, secure, highly scalable,  open-source NoSQL search platform from the Apache Lucene

Ref.

https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/aem-and-solr-6-reasons-to-use-managed-solr-service-aem-community/m-p/381197#M27349

https://myaemlearnings.blogspot.com/2020/06/apache-tika-config-in-lucene-index-and.html

dwazirlAuthor
Level 2
February 21, 2022

@nitin_laad thanks for the reply. Yes it appears that SOLR or TIKA are the way to go.