Highlighted

PDF to XML conversion

Avatar

02-07-2007

Hi all,



I asked a question in Acrobat SDK forum about using Acrobat standard SDK for automatic PDF to XML conversion on a server and I was told that Acrobat lisence does not permit that and I need to use the Adobe LiveCycle ES.



I just need to automatically convert the incoming PDF files to XML on a server (automating Acrobat Standard's "SaveAS XML" function)



Could you please tell me which LiveCyle component can do this for me and also give me its approximate price.



Thanks very much for your help,

Arash

Replies

Highlighted

Avatar

Avatar

Jasmin_Charbonn

Avatar

Jasmin_Charbonn

Jasmin_Charbonn

05-07-2007

What excactly are you trying to get in that XML file. The data from the PDF document, the metadata,etc.



Can you be a little bit more specific because there might be different products to do different things. I did a quick test with the SaveXML form Acrobat and it seem to contain only the metadata, but I'd lie to confirm.



Thanks,



Jasmin
Highlighted

Avatar

06-07-2007

Hi Jasmin,<br /><br />Thanks for you reply. I need the content of the PDF file converted to XML. For example in the XML file that I get from Acrobat SaveAS XML function paragraphs are tagged with <p?>, tables are tagged with <table?>, table cells are tagged with <TD?>, table rows are tagged with <TR?> and so on. There are a few PDF to XML/HTML tools but none of them does a clean conversion for example they might break the content of a single cell into two parts and tag each part as an individual cell. These kind of bugs cause problem when I want to extract information from the XML file via Natural Language Processing. I have a small VB program that can be run from command line and gets a PDF file as Input and converts that PDF file to XML using Acrobat Standard and saves the generated XML file on the local drive. I want this program to run on a Server and converts all the incoming PDF files to XML. But I was told that the Acrobat Standard licence does not allow me to use Acrobat on a server for automatic batch conversion.<br /><br /> Thanks very much for your help.<br /><br />Following is part of an XML file generated by SaveAs XML function in Acrobat Standard 8.0:<br /><br />(I added Question marks to the tags manually so they would not be interpreted as HTML tags)<br /><br /><Table?><br /><br /><TR?><br /><br /><TH?>Master of Business Administration (MBA)Approved Course Schedule Information </TH?><br /><br /></TR?><br /><br /><TR?><br /><br /><TH?>Module Title </TH?><br /><br /><TD?><Figure? ActualText?="Leading Effectively "><br /><br /><ImageData? src=""/><br /><br />LeadingLeading EEffeffectivelyctively</?Figure><br /><br /></TD?><br /><br /></TR?><br /><br /> <TR?><br /><br /><TH?>Number of Credits </TH?><br /><br /><TD?>25 Credits </TD?><br /><br /></TR?><br /><br /><TR?><br /><br /><TH?>Subject Status </TH?><br /><br /><TD?>Mandatory </TD?><br /><br /></TR?><br /><br /><TR?><br /><br /><TH?>Quantity of Learning Experience </TH?><br /><br /><TD?>400 hours, broken down over Stages 1 and 2 as follows Directed Study 85 Independent Study 315 Total 400 </TD?><br /><br /></TR?><br /><br /> <TR?><br /><br /><TH?>Allocation of Marks </TH?><br /><br /><TD?>C/A </TD?><br /><br /><TD?>Project </TD?><br /><br /> <TD?>Practical </TD?><br /><br /> <TD?>Final </TD///?><br /><br /></TR?><br /><br /> <T/R><br /><br /><TD?/>0 </TD?><br /><br /> <TD?>100% </TD?><br /><br /><TD/?><br /><br /><TD?>0 </TD?><br /><br /></TR?><br /><br /></Table?>
Highlighted

Avatar

Avatar

Jasmin_Charbonn

Avatar

Jasmin_Charbonn

Jasmin_Charbonn

23-07-2007

So is this really a html representation of the PDF.



PDFG should be the product that can convert PDF into HTML. The operation is ExportPDF.



Jasmin
Highlighted

Avatar

28-08-2007

Hi,

My requirement is send the xml to a server and there allow another application use this xml.



I am trying to use ASP to take this xml and save it on the server but I am unable to do it because dont know how. Is it possible?

Thanks,

merlin
Highlighted

Avatar

Avatar

Jasmin_Charbonn

Avatar

Jasmin_Charbonn

Jasmin_Charbonn

28-08-2007

You can definitely post XML to an ASP from a PDF. You need to add a submit button into the form and make sure you specify the location where you want to post the information to ( in you case XML). You also need to specify what you want to post (PDF, XML, XDP).



In your asp, you can then get the xml out of the request object.



Is that what you're trying to do?



Jasmin
Highlighted

Avatar

29-08-2007

Hi Jasmin,



I am using Acrobat Pro 6 and when I defined the submit button I just saw FDF, HTML, XFDF and PDF. Now, If I want to get the xml I guess I could use the XFDF option. However, what should I do in asp in order to get this xfdf file. I read something about request object, but don't know it. Could you please let me know what is the syntax I need to use.

Thank you very much.