Search for text in Scanned(OCRed) pdf documents. | Community
Skip to main content
Level 2
October 16, 2015
Solved

Search for text in Scanned(OCRed) pdf documents.

  • October 16, 2015
  • 1 reply
  • 816 views

I want to develop a component for pdf(asset) searching. The existing QueryBuilder API does not search thru  Scanned and OCRed PDF files. It searches only normal PDF files. Is there a way I can achieve this?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by smacdonald2008

Out of the box with AEM, this is not supported. You would need to use a Java lib that is able to perform this task (assuming that a Java API exists that can do this job) and build a custom AEM service. This Java API looks like it may be the way to proceed with this use case. 

http://asprise.com/royalty-free-library/java-ocr-api-overview.html

1 reply

smacdonald2008
smacdonald2008Accepted solution
Level 10
October 16, 2015

Out of the box with AEM, this is not supported. You would need to use a Java lib that is able to perform this task (assuming that a Java API exists that can do this job) and build a custom AEM service. This Java API looks like it may be the way to proceed with this use case. 

http://asprise.com/royalty-free-library/java-ocr-api-overview.html