Expand my Community achievements bar.

SOLVED

Can the asset search function support PDF content search

Avatar

Level 1

I test searching for asset content. Supports content search in Word, Excel, and PPT, but does not support PDF.

 

AEM integrates Apache Tika, how to achieve PDF content search with minimal modifications?

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi @QingZhou,

In general pdf content indexing/content search is supported OOTB, see official documentation:

In other words assuming that you did not change anything in OOTB AEM configuration, this should simply work. I have checked that quickly on my AEM 6.5 SP19, and I was able to get pdf file in search results base on phrase included in the document, either using search from Touch UI and in crx/de. If any point you need to modify Apache Tika configuration this can be done as follow:

If this is not working for you there might be an issue with the pdf file itself - how it was created etc, I would suggest to download a sample pdf file from Adobe site for testing purposes.

View solution in original post

2 Replies

Avatar

Correct answer by
Community Advisor

Hi @QingZhou,

In general pdf content indexing/content search is supported OOTB, see official documentation:

In other words assuming that you did not change anything in OOTB AEM configuration, this should simply work. I have checked that quickly on my AEM 6.5 SP19, and I was able to get pdf file in search results base on phrase included in the document, either using search from Touch UI and in crx/de. If any point you need to modify Apache Tika configuration this can be done as follow:

If this is not working for you there might be an issue with the pdf file itself - how it was created etc, I would suggest to download a sample pdf file from Adobe site for testing purposes.

Avatar

Level 1

Hi,lukasz-m,

We are using AEM6.5.10
Unable to support PDF content search on 6.5.10, Adobe engineer told me that it is supported on 6.5.21
We cannot upgrade to the latest version because there are many secondary developments
How can I modify the code to enable PDF content search in version 6.5.10?