I have a customer with up to 3,000,000 documents in a repository (combination of MS formats mainly but with some PDFs/images such as scanned files) which they plan to convert to PDF via PDFG (LC 8.2.1) and then store via Content Services.
They also are looking at scanning additional documents to add to this repository and want them to be converted to PDF - ideally making use of OCR to identify text. Given the size of the repository - they are looking for guidance on performance for PDFG along with guidance on best practices for using OCR given the single thread requirements.
Looking around, I have found some information on this but was wondering if anyone has any best practices or advice they could share with. The type of questions they would like answering include:
Thanks in advance,
Alastair