Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.

How to Extract XML Data from PDF on Server Side

Avatar

Former Community Member
I have developed a PDF using Acrobat 8/Livecycle Designer. It's got an embedded XML schema and the fields are bound to the schema. I want to "write-enable" the document with Reader Extensions, distribute the file to our end-users, let them fill it in (using Adobe Reader) and save the PDF locally. They will then be able to upload the PDF file to our JEE application that's running on WebSphere. When the file is uploaded we want to extract the XML data in the webapp. Is there a java library somewhere that would allow me to do this (or convert to XDP format and get it from there)?



Thanks,

Michael
6 Replies

Avatar

Former Community Member
You need to look at the quick start samples in the LC J2EE SDK Forms Data Integration section. Very simple API, only a few methods to call to import or export.



Ironically, I'm actually testing the import sample right now and getting a run-time exception using the JBOSS trial version. Probably a config/setup thing tho. - eric

Avatar

Former Community Member
Eric,

Thanks for the reply - I did actually locate that sample code after I'd posted my question. However, I was sort of hoping to avoid purchasing and running a Livecycle Server in our production environment. It seems a tad expensive and complicated.

Michael

Avatar

Former Community Member
PDF4NET is a cheap(er than LiveCycle) .NET library that should do the job. You could wrap a .NET web service around the functions you require and call it from your Java app.



Adobe used to publish XPAAJ, which was a Java library that (amoungst other things) allowed you to extract XML from a PDF. The only catch was you had to be an enterprise customer (own any LiveCycle product or ColdFusion).



I am pretty sure that this isn't available anymore for download.



John.

Avatar

Former Community Member
Mike, sorry I'm just using the JBOSS trial version and found the SDK pretty straight forward, at least for the Form Data Integration methods. Once I got through some partially self inflicted install and config hacks, I was easily able to get those samples working with PDF files and XML I created myself with Designer ES.



Not sure if you know this but you can use Acrobat Pro to enable some of the Reader Extension like features, but its obviously a user driven process. For us, we need automated enterprise level processing to do that enable-ment and PDF/XML pre/post processing. If you just want import/export capability just add some JavaScript to your Acrobat enabled forms that call the xfa.host.importData()/exportData() functions. Then even the free Reader will work to load or save form data via XML to and from the file system. Good luck - eric

Avatar

Level 7
As a slightly different approach... given that extracting XML data

from a PDF is indeed the job of an Adobe server product ... maybe you

could consider why you want to submit a PDF, only to expensively

discard most of the file and get an XML. Why not just submit the XML,

which also means Reader Extensions are unnecessary.



Aandi Inston

Avatar

Former Community Member
Thanks for all your input - I've got a few (good) options so now just need to juggle costs with user requirements/expectations. cheers