Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.

Extract Attachment from Pdf

Avatar

Level 1
I want to extract the files attached with pdf.I am using Assembler 7.2 . The DDX for extract is given below





It returns the xml file with attachment file datastreams name. How can i identify that file is of what type?
how to store that data stream ?
3 Replies

Avatar

Former Community Member
You can iterate through the attachmentInfo.xml and look at the mimeType attribute of the child <File> element. The mimeType attribute will be the best hint you'll have of the file type. You can then save the document using the appropriate file suffix.<br />Don

Avatar

Former Community Member
I want to extract the attachement from the pdf document and display it to the attachement list of the workspace. I am using assembler service to extract the attachments.



The assembler service assigns a unique name to the document. i want to know is there a way to use the same file name as in the pdf document (for example resume.doc or passport.png). In the workspace it shows "attach-0" with no extenstion and the person has to save that file and open it with the appropriate application.



Thanks in Advance

Avatar

Former Community Member
As Don pointed out, you can parse the result XML which contains information about each extracted file attachment, including the unique name created for the document, the filename that was originally associated with the document, and the content type, if it was originally provided.



The problem with using the filename for the extracted name of the document in the AssemblerResult documents map is that the filename is not guaranteed to be unique.



You can ask for the result XML without actually extracting the file attachments (extract="false"), and then use the information from the result XML to get exactly the file attachments you want.



The unique name created (the attachmentKey) actually provides information as to whether the file attachment was attached to the document in general or to a specific page, such as for a comment annotation.



The attachments.xsd schema is installed in the LC ES SDK to assist with parsing.



Here's an example result XML with one document-level and one page-level file attachment:



<?xml version="1.0" encoding="UTF-8"?>

<Attachments xmlns="http://ns.adobe.com/DDX/Attachments/1.0/">



<Attachment attachmentKey="doc.source_attach.0000.0001" name="Untitled Object">

<File creationDate="2004-08-05T01:34:18Z" mimeType="image/jpeg" modificationDate="2000-03-30T01:46:42Z" size="14359">

<Filename fromEncoding="ISO-8859-1" success="true">Origami.jpg</Filename>

</File>

<Description>This is Origami.jpg, a document-level attachment.</Description>

</Attachment>



<Attachment attachmentKey="doc.source_attach.0003.0001">

<File creationDate="2004-08-05T01:34:16Z" mimeType="image/gif" modificationDate="2002-09-10T01:51:22Z" size="3939">

<Filename fromEncoding="ISO-8859-1" success="true">dog.gif</Filename>

</File>

<Description>This is a dog.gif from page 3</Description>

<Page pageNumber="3">

<Location x="532.5" y="720.75"/>

</Page>

</Attachment>



</Attachments>