Expand my Community achievements bar.

Learn about Edge Delivery Services in upcoming GEM session
SOLVED

Search for text in Scanned(OCRed) pdf documents.

Avatar

Level 2

I want to develop a component for pdf(asset) searching. The existing QueryBuilder API does not search thru  Scanned and OCRed PDF files. It searches only normal PDF files. Is there a way I can achieve this?

1 Accepted Solution

Avatar

Correct answer by
Level 10

Out of the box with AEM, this is not supported. You would need to use a Java lib that is able to perform this task (assuming that a Java API exists that can do this job) and build a custom AEM service. This Java API looks like it may be the way to proceed with this use case. 

http://asprise.com/royalty-free-library/java-ocr-api-overview.html

View solution in original post

1 Reply

Avatar

Correct answer by
Level 10

Out of the box with AEM, this is not supported. You would need to use a Java lib that is able to perform this task (assuming that a Java API exists that can do this job) and build a custom AEM service. This Java API looks like it may be the way to proceed with this use case. 

http://asprise.com/royalty-free-library/java-ocr-api-overview.html