Expand my Community achievements bar.

Document analysis lifecycle

Avatar

Level 1

Not sure if this belongs to this forum alone, but i'm trying to put together an Adobe based architecture for document analysis, template matching and extraction of data from documents (essentially images) and storage of the extracted data as part of the document itself as an associated meta-data file. Can someone help me identify what Adobe components can be chained together to achieve this from the huge list of products available? Appreciate it.

4 Replies

Avatar

Level 8

Not much detail here (document analysis can mean many different things), but you may want to start looking at Forms and Assembler

Avatar

Level 1

Thanks. At a basic level, I'm talking about

a) extraction of data from a scanned or photographed document image

b) using a template to match to that document and extract the data from the image

c) creation of such a template in the first place so that it can be applied to subsequent documents

d) how much can this be done without user intervention, so essentially Adobe product API's that can handle this.

Avatar

Level 8

The devil is always in the details with projects such as this, but here is a list that should get you started:

a) extraction of data from a scanned or photographed document image

     If the documents come from any source then converting scanned images into OCR'd documents can be done with PDF Generator. This will give you a PDF with searchable characters.   The trick then is finding where the data is on the document.  You may be able to use Assembler to extract info from specific areas on the page.

   To add one more wrinkle - if you can create the template and you need to have it faxed/photed in then you can add a barcode to store the data using LiveCycle Barcode.  That way you would be able to extract the data from the barcode instead of the OCR info (which is much more accurate)

b) using a template to match to that document and extract the data from the image

  I'm not sure how you would determine what template goes with a scanned document unless you used a barcode (you could have the template id as part of the data).  You may be able to pull the id from a scanned document using Assembler if you knew where the exact coordinates of the data

c) creation of such a template in the first place so that it can be applied to subsequent documents

    PDF document templates can be created with Designer

d) how much can this be done without user intervention, so essentially Adobe product API's that can handle this.

   You should be able to automate most, if not all of this.

There is another option however:

  Rather than have your users scann/photo the document - have them submit the information directly from the PDF.  In other words, the PDF (created in Designer) would have a submit button that would send the data (via email or an HTTP Post) to a LiveCycle workflow.  You could then use LiveCycle Forms to extract the data in an XML format.  This would give you much more reliable control over the data.  LC Forms could also be used to merge the data with another template.

Avatar

Level 1

Thanks for the detailed reply.

Unfortunately these documents i talk about are paper documents that end- users send us, (driver's license, W2s, car insurance, etc) and none of these are generated by us to insert a barcode even. I will look further into your suggested components to see what can meet our needs. Based on that i might get back on this post.

Thanks again.