Expand my Community achievements bar.

SOLVED

AEM (Adobe Experience Manager) won't index PDF content in search results?

Avatar

Level 2

My employer has recently switched its CMS to AEM (Adobe Experience Manager).

We store a large amount of documentation on our public website.
Our customers need to be able to find information contained within those documents, some of which are 100s pages in length.

Adobe are disappointingly saying their search tool will not search PDFs. Is there any format for producing or saving pdfs that allow the content be indexed?

Is there any workaround/api's available to fix is this?

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

@dwazirl 

You can explore Apache Tika, for text extraction and indexing.
If content search in main feature of site, will recommend you to go for external search engine SOLR or Elastic Search.

Solr is an enterprise grade, secure, highly scalable,  open-source NoSQL search platform from the Apache Lucene

Ref.

https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/aem-and-solr-6-reasons-to-...

https://myaemlearnings.blogspot.com/2020/06/apache-tika-config-in-lucene-index-and.html

View solution in original post

2 Replies

Avatar

Correct answer by
Community Advisor

@dwazirl 

You can explore Apache Tika, for text extraction and indexing.
If content search in main feature of site, will recommend you to go for external search engine SOLR or Elastic Search.

Solr is an enterprise grade, secure, highly scalable,  open-source NoSQL search platform from the Apache Lucene

Ref.

https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/aem-and-solr-6-reasons-to-...

https://myaemlearnings.blogspot.com/2020/06/apache-tika-config-in-lucene-index-and.html

Avatar

Level 2

@Nitin_laad thanks for the reply. Yes it appears that SOLR or TIKA are the way to go.