Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
Bedrock Mission!

Learn more

View all

Sign in to view all badges

How to tell if a document has been OCRed

Avatar

Level 10

Hi,

I'm wondering if any of you have an idea on how we can tell if a document has been OCRed.

The use case is a bunch of files a dropped somewhere and a LC process would check if the files have already been OCRed. If not, then it would use PDFG to OCR the file.

There doesn't seem to be any relevant information in the Metadata and I'm wondering if there's a service that can do that.

Thanks,

Jasmin

1 Reply

Avatar

Level 2

Interesting, I got this exact same question asked last week by a customer. I discussed this with our PDF guru Colin van Oosterhout and he informed me that there is no way to recognize from the 'outside' whether a PDF has been OCRed or not.

However, he *did* mention that using Acrobat batch OCR if you have a directory stuffed with OCRed and non-OCRed files... that Acrobat actually 'skips' the files that have already been OCRed.

I haven't gotten to test this behavior for PDFgen, but it's worth a try. Let me know if you find something interesting?

Thanks,

Waldo