Expand my Community achievements bar.

July 31st AEM Gems Webinar: Elevate your AEM development to master the integration of private GitHub repositories within AEM Cloud Manager.

How can I combine 200+ XDP (XML) files into one spreadsheet or database for analysis?

Avatar

Level 1

Hi all,

I've got 200+ XDP files that were created by entering data into a LiveCycle PDF form 200+ times and then exporting the data as XDP 200+ times.

Now, I want to view all the data at once (for example, in an Excel Spreadsheet or database) so I can look for trends in the data or come up with a summary (for example, 180 of 200 responses answered Question 1 'Yes'). The form does have some image fields, but I don't need to see or analyse those fields.

I'm a bit of a newbie. Is there an easy way to do this? Thanks!

- John

5 Replies

Avatar

Former Community Member

The XDP is an XML format so you shodul be able to write a program to extract the values that you want. You coudl potentially make a comma delimited format then import it into Excel for analysis.

I have not written a program like that but I know it can be done.

Paul

Avatar

Level 1

I had a similar goal to gather data from a folder full of XDPs into a spreadsheet to track user behaviors, capture commonly entered values, and run reports.

If you are familiar with Python, I highly recommend creating a simple script using Python 2.6 and the elementTree (effbot.org) libraries.

Since I cut my teeth on this project, I found it helpful to break down the script's tasks into individual parts. The material and examples for getting a folder's contents, writing variables in lists, and using elementtree is as terrific as it is scattered in location and diverse in implementation.

1. Grab an xdp in your directory ( for book in path: )

2. Parse the xml by using the elementtree iterator

3. Assign your element's values to variables during iteration

if element.name == 'dataNode':

     variable = element.text

4. At the end print your variables with deliminators between each (I used a pipe "|" because commas were commonly used in our forms)

print "%s|%s|%s|%s|%s" % (currentFileName, variable1, variable2, variable3, variable4)

Once you are happy with the data printed out to your console; and each row corresponds to a xdp, stream the output to a file and import that file into Excel in a CSV-style text import (specifying your deliminator during the import process)

so if your script that does the above is called "xdpGrabber.py", run:

$ python xdpGrabber.py >> xdpSpreadSheet.txt

and open xdpSpreadSheet.txt in excel.

There are some gotchas here - repeating nodes can be tricky, and heavily nested or scattered xml structures will add much more complexity to the script.  However, if you are only looking for certain fields that always come in the same order when reading the xdps from top to bottom you can get them using simple "if element.name == 'stuff'" references in the order they appear - as elementtree parses from top to bottom.

If this is appealing to you, I suggest you run one of your sample xdps against the example script given here: http://www.xml.com/pub/a/2003/02/12/py-xml.html

I used Python 2.6 and elementtree on a windows XP laptop.  I would routinely create spreadsheets from 3000+ xdps, each with 50-100KB of data, in under 15 minutes.  I'm guessing the real bottleneck is streaming the output to disk. There isn't much out there that can compete with elementtree for speed reading xml!

If you need some assistance getting started  feel free to reply here or message me for an example script.

Good luck!

-Kasey (scriptocratch)

Avatar

Level 4

Excel understands XML and so there is probably a simple way to accomplish this, but I don't know how.  I do know that Acrobat's Tracker feature was designed specifically for PDF forms data aggregation and optional export to Excel, etc., so check it out.  You'll need the 200 PDFs.

Avatar

Level 1

I think you can make a connection to XML in Access, and there may be a batch process out there in Acrobat 9.0 that can handle your whole task for you with minor tweeking. 

If you are curious about Python and Elementtree and want to take a dip into the deep end of XML processing, I highly recommend trying it out!  Especially if you are new to XML or want to do more with schemas, data bindings, and namespaces within designer!