Wednesday, October 20, 2010

Groovy: Retrieving the Value of Multiple XML Elements

Yesterday, I ran into an interesting case at work where some code was parsing XML using Groovy's XmlSlurper to retrieve the value of an element and treat it as a String. Something along the likes of:

  def xml = "<xml><character>a</character></xml>"
def node = new XmlSlurper().parseText(xml)
String result = node.character
println result

Which simply prints out a. The code was expected to only return value of the first element found, but when a second element is added:

  def xml = "<xml><character>a</character><character>b</character></xml>"
def node = new XmlSlurper().parseText(xml)
String result = node.character
println result

The resulting output is ab, which broke a new test case.

One solution to the issue is to grab the first element with:

  String result = node.character[0]

Another interesting point is that the result of node.character[0] is a NodeChild, not a String. Since the type of the result variable is declared, the right side of the assignment was coerced to a String. If that were not the case, and we had:

  def result = node.character[0]
println result.getClass().getName()
println result

The output would be:

  groovy.util.slurpersupport.NodeChildren
a

Note:We need to use getClass() as result.class would return the class node and not the Class object itself.

Alternatively, we can use the text() method of NodeChildren (which also exists for NodeChild) to ensure we get a String:

  def result = node.character[0].text()
println result.getClass().getName()
println result

Giving us:

  java.lang.String
a

To summarize, putting all the methods discussed so far to use with:

  def xml = "<xml><character>a</character><character>b</character></xml>"
def node = new XmlSlurper().parseText(xml)

println node.character
println node.character.getClass().getName()
println()

println node.character.text()
println node.character.text().getClass().getName()
println()

println node.character[0]
println node.character[0].getClass().getName()
println()

println node.character[0].text()
println node.character[0].text().getClass().getName()
println()

We get:

  ab
groovy.util.slurpersupport.NodeChildren

ab
java.lang.String

a
groovy.util.slurpersupport.NodeChild

a
java.lang.String

While the original gotcha might not be all that hard to resolve, hopefully this gives a little insight to those who might explore things a little bit further.

No comments:

Post a Comment