My employer has recently switched its CMS to AEM (Adobe Experience Manager).
We store a large amount of documentation on our public website.
Our customers need to be able to find information contained within those documents, some of which are 100s pages in length.
Adobe are disappointingly saying their search tool will not search PDFs. Is there any format for producing or saving pdfs that allow the content be indexed?
Is there any workaround/api's available to fix is this?
Solved! Go to Solution.
Views
Replies
Total Likes
You can explore Apache Tika, for text extraction and indexing.
If content search in main feature of site, will recommend you to go for external search engine SOLR or Elastic Search.
Solr is an enterprise grade, secure, highly scalable, open-source NoSQL search platform from the Apache Lucene
Ref.
https://myaemlearnings.blogspot.com/2020/06/apache-tika-config-in-lucene-index-and.html
You can explore Apache Tika, for text extraction and indexing.
If content search in main feature of site, will recommend you to go for external search engine SOLR or Elastic Search.
Solr is an enterprise grade, secure, highly scalable, open-source NoSQL search platform from the Apache Lucene
Ref.
https://myaemlearnings.blogspot.com/2020/06/apache-tika-config-in-lucene-index-and.html
@Nitin_laad thanks for the reply. Yes it appears that SOLR or TIKA are the way to go.
Views
Likes
Replies