Expand my Community achievements bar.

Enhance your AEM Assets & Boost Your Development: [AEM Gems | June 19, 2024] Improving the Developer Experience with New APIs and Events

How can I read a pdf file using an ALC process

Avatar

Level 2

I have a pdf form with a schema, I want to read the pdf form in to an xml with same schema so that i can process the data  that i get from the pdf file.

I have tried the following

  •      I have created an input document variable (where i ll give my pdf form as input)
  • I have created an output xml variable with the same schema as the input pdf document.
  • I have created a process with set value service which assign the input pdf into the xml variable(so that i ll get the data from the pdf in to xml)

Is the above approch is correct ..

If any one knows, please suggest an approch to read my pdf.

Thank You

    5 Replies

    Avatar

    Former Community Member

    The fact that you have used a schema merely meand that the submission of the form data will be in the format defined by the schema. Generally when you submit the form you associate an XML variable in the user step to recieve that data (this assumes that your form is setup to submit xdp data). If you associate your schema with that XML variable you will be able to expand and see teh structure of your schema when using SetValue services etc. If you are merely interested in getting what was submitted (say to validate the data against the schema) you will have to assign the inbound data to another variable and remove the outer nodes that were added for our internal purposes (xfa.datasets.data). This is done in a setValue where you woudl use an XPath expresion of xfa/datasets/data/* (meaning that all nodes below the data norde are copied to the named variable.

    Hope that helps

    Paul

    Avatar

    Level 2

    Thank you for your reply.

    But, in my case,I am not 'submitting' the pdf form. I am just filling up the pdf, saveing it and use that pdf as the input to my process (without pressing the submit button). I want to read the data that is been filled in the pdf form.

    My scenario is, I have a email startpoint and i will get a pdf attachment through email (not a submitted form), i want to read that pdf and have to process the data inside the pdf. How can I read the pdf in this case.

    Avatar

    Former Community Member

    Ok so when you submit the PDF the variable that will recieve it must be of type document. There is a service under the Common category that will extract the data for you. It is called the For Data Integration Service and it has two operations importData and exportData. You want the exportData Operation. It is simple in that it will take the document var that holds your PDF as input and will give you back the xml data file as a DOCUMENT! Before you can use this in xPath expression you will have to cast it to an xml variable. This is done is the setValue service. Simply put the xml document var in the Expression and the xml var in the Location and you are good to go. You can point the xml var at your schema to make the construction of the xPath expressions easier but it is not a mandatory thing.

    Hope that helps

    Paul

    Avatar

    Level 2

    My issue is solved , I have used 'Process Form Submission' service, set the environment variable as 'application/pdf', set the 'PDF to XDP' attribut true, got the out put in to a document (actually an xml document) and read the xml using the set value service.

    so i was able to read the form using the above method.

    Thank You for your help ...

    Avatar

    Level 2

    Give the input pdf file .

    Use the 'ProcessFormSubmission' service. Set the environment variable 'content_type = application/pdf'

    Set the submission option 'PDF to XDP =true'.

    Get the output in to document.

    Assign the output documnet into an xml document. Extract the data from xml using set value service.