- Mark as New
- Follow
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report
I had a similar goal to gather data from a folder full of XDPs into a spreadsheet to track user behaviors, capture commonly entered values, and run reports.
If you are familiar with Python, I highly recommend creating a simple script using Python 2.6 and the elementTree (effbot.org) libraries.
Since I cut my teeth on this project, I found it helpful to break down the script's tasks into individual parts. The material and examples for getting a folder's contents, writing variables in lists, and using elementtree is as terrific as it is scattered in location and diverse in implementation.
1. Grab an xdp in your directory ( for book in path: )
2. Parse the xml by using the elementtree iterator
3. Assign your element's values to variables during iteration
if element.name == 'dataNode':
variable = element.text
4. At the end print your variables with deliminators between each (I used a pipe "|" because commas were commonly used in our forms)
print "%s|%s|%s|%s|%s" % (currentFileName, variable1, variable2, variable3, variable4)
Once you are happy with the data printed out to your console; and each row corresponds to a xdp, stream the output to a file and import that file into Excel in a CSV-style text import (specifying your deliminator during the import process)
so if your script that does the above is called "xdpGrabber.py", run:
$ python xdpGrabber.py >> xdpSpreadSheet.txt
and open xdpSpreadSheet.txt in excel.
There are some gotchas here - repeating nodes can be tricky, and heavily nested or scattered xml structures will add much more complexity to the script. However, if you are only looking for certain fields that always come in the same order when reading the xdps from top to bottom you can get them using simple "if element.name == 'stuff'" references in the order they appear - as elementtree parses from top to bottom.
If this is appealing to you, I suggest you run one of your sample xdps against the example script given here: http://www.xml.com/pub/a/2003/02/12/py-xml.html
I used Python 2.6 and elementtree on a windows XP laptop. I would routinely create spreadsheets from 3000+ xdps, each with 50-100KB of data, in under 15 minutes. I'm guessing the real bottleneck is streaming the output to disk. There isn't much out there that can compete with elementtree for speed reading xml!
If you need some assistance getting started feel free to reply here or message me for an example script.
Good luck!
-Kasey (scriptocratch)
Views
Replies
Total Likes