Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.
SOLVED

AEMaaCS PDF (Documents) Searching Content - No results

Avatar

Level 5

Is it true that content in Assets like PDFs are not indexed in AEMaaCS?  I can see the tika is being used and pdf's are not filtered out in the ootb config.

Why is the content not being searched?  Are we required to use a 3rd party search tool like Elastic/Solr for this functionality?

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

please check these details once :

 

In Adobe Experience Manager as a Cloud Service (AEMaaCS), full-text indexing of binary files like PDFs is supported out of the box. This is done using Apache Tika, which is capable of extracting text from various file formats including PDF.

https://experienceleague.adobe.com/docs/experience-manager-cloud-service/content/operations/indexing...

 

  1. Indexing Configuration: Check your Oak Index definitions to make sure that full-text indexing is enabled for nt:file nodes (which is what AEM uses to store binary files like PDFs). You can do this in the CRXDE Lite.

some times pdf files may not have full text extract feature- this may also required to check once.

 

https://suman-shekhar.medium.com/aem-text-extraction-using-apache-tika-d0eb740eec39

View solution in original post

2 Replies

Avatar

Correct answer by
Community Advisor

please check these details once :

 

In Adobe Experience Manager as a Cloud Service (AEMaaCS), full-text indexing of binary files like PDFs is supported out of the box. This is done using Apache Tika, which is capable of extracting text from various file formats including PDF.

https://experienceleague.adobe.com/docs/experience-manager-cloud-service/content/operations/indexing...

 

  1. Indexing Configuration: Check your Oak Index definitions to make sure that full-text indexing is enabled for nt:file nodes (which is what AEM uses to store binary files like PDFs). You can do this in the CRXDE Lite.

some times pdf files may not have full text extract feature- this may also required to check once.

 

https://suman-shekhar.medium.com/aem-text-extraction-using-apache-tika-d0eb740eec39

Avatar

Community Advisor

@sdouglasmc 

 

Please cross-check is there is any custom index is created for damAssetLucene. If yes, please sure tika configs are created as per :

Content Search and Indexing | Adobe Experience Manager


Aanchal Sikka