Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.

Convert PDF to XML in Java

Avatar

Level 1

Hello,

I need to convert a PDF file in to XML format programmatically in Java. Can any one please provide any pointers?

Thanks in advance.

-Deep

4 Replies

Avatar

Level 10

Do you mean, you want to extract the data from a PDF in XML, or you literally want to convert the PDF to XML?

Jasmin

Avatar

Level 1

Thanks Jasmin for a prompt response. I want to extract the data and meta data (font size etc), images in the same format. We basically have to convert the PDF to our own properitory XML format for further processing. Is that possible? I would hope that many people would have similar needs in the past and must have done it successfully.

If conversion to direct XML is not possible, then one way I can think of is to first convert it in to HTML format and then from HTML, convert it in to XML format. But then the same question remains - how to convert PDF to HTML ?

Thanks,

Deep

Avatar

Level 10

Take a look at the  ExportPDF( ) operations in GeneratePDF service.

Input Document is the PDF. You should be able to choose the format type to be XML.

Jasmin

Avatar

Level 1

Hi all,

Is there any solution to convert PDF to XML without server side Java/.Net?

I need pure Actionscript solution with help of some library either open source or licensed.

Any help will be appreciated very much.

-- Maksym Melnishyn.