Adobe Experience Manager Sites & More

sdouglasmc · 3/11/24

Is it true that content in Assets like PDFs are not indexed in AEMaaCS? I can see the tika is being used and pdf's are not filtered out in the ootb config.

Why is the content not being searched? Are we required to use a 3rd party search tool like Elastic/Solr for this functionality?

SureshDhulipudi · 3/11/24

please check these details once :

In Adobe Experience Manager as a Cloud Service (AEMaaCS), full-text indexing of binary files like PDFs is supported out of the box. This is done using Apache Tika, which is capable of extracting text from various file formats including PDF.

https://experienceleague.adobe.com/docs/experience-manager-cloud-service/content/operations/indexing...

Indexing Configuration: Check your Oak Index definitions to make sure that full-text indexing is enabled for nt:file nodes (which is what AEM uses to store binary files like PDFs). You can do this in the CRXDE Lite.

some times pdf files may not have full text extract feature- this may also required to check once.

https://suman-shekhar.medium.com/aem-text-extraction-using-apache-tika-d0eb740eec39

View solution in original post

SureshDhulipudi · 3/11/24

please check these details once :

In Adobe Experience Manager as a Cloud Service (AEMaaCS), full-text indexing of binary files like PDFs is supported out of the box. This is done using Apache Tika, which is capable of extracting text from various file formats including PDF.

https://experienceleague.adobe.com/docs/experience-manager-cloud-service/content/operations/indexing...

Indexing Configuration: Check your Oak Index definitions to make sure that full-text indexing is enabled for nt:file nodes (which is what AEM uses to store binary files like PDFs). You can do this in the CRXDE Lite.

some times pdf files may not have full text extract feature- this may also required to check once.

https://suman-shekhar.medium.com/aem-text-extraction-using-apache-tika-d0eb740eec39