Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.
SOLVED

Search for text in Scanned(OCRed) pdf documents.

Avatar

Level 2

I want to develop a component for pdf(asset) searching. The existing QueryBuilder API does not search thru  Scanned and OCRed PDF files. It searches only normal PDF files. Is there a way I can achieve this?

1 Accepted Solution

Avatar

Correct answer by
Level 10

Out of the box with AEM, this is not supported. You would need to use a Java lib that is able to perform this task (assuming that a Java API exists that can do this job) and build a custom AEM service. This Java API looks like it may be the way to proceed with this use case. 

http://asprise.com/royalty-free-library/java-ocr-api-overview.html

View solution in original post

1 Reply

Avatar

Correct answer by
Level 10

Out of the box with AEM, this is not supported. You would need to use a Java lib that is able to perform this task (assuming that a Java API exists that can do this job) and build a custom AEM service. This Java API looks like it may be the way to proceed with this use case. 

http://asprise.com/royalty-free-library/java-ocr-api-overview.html