Sunday, August 23, 2009

Groovy: Preserving Whitespace when Parsing XML

Have you ever had a case where whitespace for element values in XML have meaning? I have had the pleasure of working on application where a single space between a starting and ending element tag has. Of course, coming onto the project, I didn't realize that was the case at first, so it took a while for me to realize the spaces weren't being preserved inside the application.

I was parsing some XML using Groovy's XmlSlurper, which by default, doesn't retain such spaces, which is by no means a complaint on my end because this case seems like an exception rather than the norm.

Consider this example:

def text = "<root><a>1</a><b> </b><c>2</c></root>"
def xml = new groovy.util.XmlSlurper().parseText(text)
println xml
The output of this script is simply '12', which is what I would imagine most people would expect. With my case however, I would like it to be '1 2'. I can use the keepWhitespace property of XmlSlurper:
def text = "<root><a>1</a><b> </b><c>2</c></root>"
def xml = new groovy.util.XmlSlurper(keepWhitespace:true).parseText(text)
println xml

Producing '1 2'. Hooray!

I can do something similar with XmlParser using its trimWhitespace property:

def text = "<root><a>1</a><b> </b><c>2</c></root>"
def xml = new groovy.util.XmlParser(trimWhitespace:false).parseText(text)
println xml
assert xml.b.text().equals(' ')

Producing:

  root[attributes={}; value=[a[attributes={}; value=[1]], b[attributes={}; value=[ ]], c[attributes={}; value=[2]]]]

And no assertion errors.

2 comments:

  1. HI,
    If I have something like in your example: " root[attributes={}; value=[a[attributes={}; value=[1]], b[attributes={}; value=[ ]], c[attributes={}; value=[2]]]]" how can I make it again to be an xml? (it was at the beggining but I used XmlParser to change some nodes over there).
    Thanks!

    ReplyDelete
  2. http://stackoverflow.com/questions/3926495/groovy-how-can-change-an-xmlparser-to-an-xml-format

    ReplyDelete