Expand my Community achievements bar.

Data Extraction from PDF

Avatar

Level 3

I am creating a large PDF form that people will fill out, save, and return to us. I am wondering if there is a simple method to extract the data from the PDF and store the values in a database (excel or access).

I can program in java if needed, but I'd prefer a simpler solution if one exists.

7 Replies

Avatar

Level 10

Hi,

Before you deploy the form, you can set up data connections. This would more commonly be to a database, such as SQL or Access, but you can also connect to a spreadsheet.

When processing the data/connecting to the database, the user connecting to the database will need the full version of Acrobat, OR the form would need to be Reader Enabled using the full Reader Extensions server component.

I would recommend Stefan Cameron's blog, which has solutions and examples: http://forms.stefcameron.com/.

Good luck,

Niall

Avatar

Level 3

I appreciate the help. I am still a novice with livecycle designer.

Can you explain, in english, how a data connection works? Could you point me to any relevant tutorials?

The users themselves will not be connecting to a database. The pdf's will all be saved locally on my computer, I simply need a way of automating the extraction of the data into a format that lends itself better to being searched and analyzed.

Thanks

Avatar

Level 4

Niall is correct for a direct-connected form. But there are a wider variety of choices that you may want to play with:

  • Use the distribute and collect features of forms in Acrobat and Acrobat.com.  This gives an automated way to bring all of the data together for you.  You can see this at http://formcentral.acrobat.com/ or Tools/Forms/Distribute in Acrobat X.
  • Directly connect your form as Niall suggested; you'll need to reader-extend with the server product to enable this data access with Reader, or use full
  • Gather the data results from form submissions (HTTP post)
  • Extract the data out of reader-extended full forms sent back.  LiveCycle has a service called Form Data Integration, but there are also opensource toolkits that can extract the data. Also, LiveCycle (now ADEP) can build full processes around the data, but that may be much more than you are looking for.

You may want to start with Acrobat.com's capabilities to see if this is sufficient.

Avatar

Level 10

Hi,

Chuck's suggestion of FormsCentral is a good one and may suit your requirements. We have used it before and there are now improved reporting abilities.

I don't have any tutorials, however you can check our Paul's one here at Acrobat User Group: http://acrobatusers.com/events/2220/tech-talk-database-connected-forms.

Hope that helps,

Niall

Avatar

Level 3

FormCentral seems promising, but their PDF creator is far too basic for my needs. Is it possible to use the tools with a pdf that I have created in livecycle designer?

Avatar

Level 7

In Acrobat Pro X you can easily extract form data into a CSV file. In Tools>Forms>More Form Options>Compile Returned Forms...you can choose as many files as you want of saved filled in forms and it will extract it into a single CSV.

Avatar

Level 1

Cubixguy77, Please refer to this program by A-PDF: Batch extract PDF text information to Excel. [A-PDF.com] They also have a pdf form version: Batch extract PDF Form Data. [A-PDF.com]  It has saved me a lot of trouble when trying to view multiple pages or PDF files and extract certain info such as invoice totals or account names, etc. A-PDF has made it quite simple to extract and very quick! I load my files, then view a sample page, select the fields that I want to report on, sort my column headers, then process. That simple and I have an excel/csv file with the data I want to see, which at that point you can just import into access. Hope this helps!