Expand my Community achievements bar.

Need to extract PDF form answers into an XML file

Avatar

Level 1

Hi Adobe gurus,we have a requirement like this. We have a Time Tracking and Project Monitoring System whose DB is a Oracle 10g R2. We want to automate our ''Meeting Minute'' processing.

1. The project lead will write minutes into a form PDF. i.e. a PDF where people can type in information into fields and also tick check-boxes etc.  

2. The PDF will be e-mailed to the project manager.

3. Project manager will save the PDF in a HDD directory.

4. Then he will run a program.

5. Program will pickup all the PDFs in that directory one-by-one.

6. For each PDF, the program should read the fields and get the values for the form fields and create an .XML file for it.

7. Now, another program will read the XML files, extract the information and store those in a DB against each project lead.

I went through the thread PDF to XML conversion, but unfortunately it has no complete solution. This problem is present for lots of people and I would be really grateful if Adobe experts can give a complete solution. In order to make it easy to answer, I have made a small questionnaire below:

(A.) When creating a PDF form. i.e. a PDF where you can type information to questions and tick checkboxes etc., can you create the PDF with structure. i.e. for example, the field into which a user types project name should be identifiable (as a tag or something like like PROJECT_NAME), when we create an XML out of it later? Is this called creating a tagged PDF?

(B.) Does this mean that we cannot convert an untagged PDF to XML?

(C.) In order to convert a PDF to XML do we need a XSD or DTD? I ask this because, I some converters from the web, like this one, asks for the XSD. And this tool which converts a PDF to XML, asks for a rules set before converting the PDF. So is this necessary?? i.e. Do we have to have our own XSD and rules set, or does the PDF->XML utility convert to XML based on predefined ADOBE PDF tags??

(D.) Do we HAVE TO use the Adobe LiveCycle ES DLLS to do this??? I ask this becuase most of the free PDF-2-XML convertes give wrong results and has no guarantee.

(E). Can you please elaborate the process of converting PDFs to XML. Please note that we are doing this using a program (i.e. we process a batch of PDFs).

(F.) If we use ADOBE DLLS then do we have to purchase the Adobe LiveCycle ES product??

(G.) If we purchase Adobe LC ES then can we use the DLLs in Java, .Net??? Or is it possible to call the Adobe DLLs only using C or C++ (I read about this on the .Net)?

(H.) Is Adobe LiveCycle ES a separate product from the Adobe SDK and Adobe PDF writer??

Your advice would be greatly appreicated.

Thanks in advance.

Ravi de Silva.

0 Replies