AEMaaCS PDF (Documents) Searching Content - No results | Community
Skip to main content
Level 4
March 11, 2024
Solved

AEMaaCS PDF (Documents) Searching Content - No results

  • March 11, 2024
  • 2 replies
  • 665 views

Is it true that content in Assets like PDFs are not indexed in AEMaaCS?  I can see the tika is being used and pdf's are not filtered out in the ootb config.

Why is the content not being searched?  Are we required to use a 3rd party search tool like Elastic/Solr for this functionality?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by SureshDhulipudi

please check these details once :

 

In Adobe Experience Manager as a Cloud Service (AEMaaCS), full-text indexing of binary files like PDFs is supported out of the box. This is done using Apache Tika, which is capable of extracting text from various file formats including PDF.

https://experienceleague.adobe.com/docs/experience-manager-cloud-service/content/operations/indexing.html?lang=en

 

  1. Indexing Configuration: Check your Oak Index definitions to make sure that full-text indexing is enabled for nt:file nodes (which is what AEM uses to store binary files like PDFs). You can do this in the CRXDE Lite.

some times pdf files may not have full text extract feature- this may also required to check once.

 

https://suman-shekhar.medium.com/aem-text-extraction-using-apache-tika-d0eb740eec39

2 replies

SureshDhulipudi
Community Advisor
SureshDhulipudiCommunity AdvisorAccepted solution
Community Advisor
March 11, 2024

please check these details once :

 

In Adobe Experience Manager as a Cloud Service (AEMaaCS), full-text indexing of binary files like PDFs is supported out of the box. This is done using Apache Tika, which is capable of extracting text from various file formats including PDF.

https://experienceleague.adobe.com/docs/experience-manager-cloud-service/content/operations/indexing.html?lang=en

 

  1. Indexing Configuration: Check your Oak Index definitions to make sure that full-text indexing is enabled for nt:file nodes (which is what AEM uses to store binary files like PDFs). You can do this in the CRXDE Lite.

some times pdf files may not have full text extract feature- this may also required to check once.

 

https://suman-shekhar.medium.com/aem-text-extraction-using-apache-tika-d0eb740eec39

aanchal-sikka
Community Advisor
Community Advisor
March 12, 2024

@sdouglasmcsonova 

 

Please cross-check is there is any custom index is created for damAssetLucene. If yes, please sure tika configs are created as per :

Content Search and Indexing | Adobe Experience Manager

Aanchal Sikka