Expand my Community achievements bar.

How to read a CDATA node with XPath?

Avatar

Level 1

I'm using Workbench ES2 (9.0) to build a process and am trying to use xpath to read a CDATA node from an XML variable.

But when I do, I get a different value than when I read a text node without CDATA.  How can I read this data?  Any node in the document could have CDATA in it.

xml document

<properties>

  <generation>

    <inputGenDocs>

      <inputGenDoc><![CDATA[\\myserver.com\input\200100647\IFI_Attachments2_V2_Outlier & Historical Data IFI #575.TIF]]></inputGenDoc>

    </inputGenDocs>

  </generation>

</properties>

xpath

/process_data/xmlProperties/properties/generation/inputGenDocs/inputGenDoc[

number(/process_data/@numDocIdx

)]

actual result

<?xml version="1.0" encoding="UTF-8"?>
<inputGenDoc><![CDATA[\\myserver.com\input\200100647\IFI_Attachments2_V2_Outlier & Historical Data IFI #575.TIF]]></inputGenDoc>

expected result (if not wrapped in CDATA this is the returned value)

\\myserver.com\input\200100647\IFI_Attachments2_V2_Outlier & Historical Data IFI #575.TIF

thanks.

4 Replies

Avatar

Former Community Member

Hi J_Dev_77,

Did you manage to find a solution to your problem because I'm facing the same issue ...

So far I found 2 solutions:

- Escape all characters in my CDATA to obtain a "clean" encoded string (with &lt; instead of < etc.)

- Use XSLT to extract CDATA value

Regards,

Thomas

Avatar

Level 1

Hi J_Dev_77,

We 'solved' it the following way:

substring-before(substring-after(serialize(xpath_to_cdata_node, false), "CDATA["), "]]")

I know it's ugly, but it gets the job done without having to use additional steps in the process.

Cheers

WP

Avatar

Level 1

I escaped the offending characters in the XML file

"<" --> "&lt;"

">" --> "&gt;"

"&" --> "&amp;"

"\"" --> "&quot;"

"'" --> "&apos;"

This approach was a bit tricky since the ampersand is both a character to be escaped when it is by itself, or a character NOT to be escaped when it is part of the escape sequence.  Our data was extremely "open" and we couldn't discount the possibility that these escape sequences alread existed prior to us attempting to replace other unescaped characters.

Hindsight being 20/20, it might have been easier to use XSLT to strip away the CDATA tags in the LiveCycle process.  Take a look at Willem-Paul's solution.  A fairly lengthy XPATH statement, but potentially a lot less work than what I did.

Something like this I think:

substring-before(substring-after(serialize(/process_data/xmlProperties/properties/generation/inputGenDocs/inputGenDoc[number(/process_data/@numDocIdx)], false), "CDATA["), "]]")

Avatar

Level 1

Hi guys,

Just found out there is an another, easier, way. A post on stackoverflow got me on to this: simply use the XSLT component in a process to 'convert' the incoming xml to the same xml. This will automagically convert all CDATA tags to encoded text tags which of course can than accessed used xpath without any problems.

I used the exact xslt given in the answer in the XSLT component and it works like a charm.

Here's the direct link to the answer on stackoverflow: http://stackoverflow.com/a/10543572

Cheers

WP