Saturday, October 3, 2009

XML Schema Validation with a Simple Groovy Script

I'm sure there are many methods for validating XML documents against schemas out there in Java-land. I decided I want a relatively simple way to do it using the command line method using a Groovy script. Since I'd also need to leverage some libraries to do the job, I thought I'd also try running my script from Maven, so once I defined the libraries I needed, I wouldn't have to worry about having to have them set up.

To start, I generated a simple project, selecting the basic GMaven archetype:

  $ mvn archetype:generate
...
45: internal -> gmaven-archetype-basic (Groovy basic archetype)
...
Which is located in the Codehaus repository at: http://repository.codehaus.org.

Next, I edited the POM to include dom4j, the final version:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>prystasj</groupId>
<artifactId>schema-validation</artifactId>
<name>Schema Validation</name>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.groovy.maven</groupId>
<artifactId>gmaven-plugin</artifactId>
<version>1.0-rc-5</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>org.codehaus.groovy.maven.runtime</groupId>
<artifactId>gmaven-runtime-1.6</artifactId>
<version>1.0-rc-5</version>
</dependency>
</dependencies>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>dom4j</groupId>
<artifactId>dom4j</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy</artifactId>
<version>1.6.4</version>
</dependency>
</dependencies>
</project>

I wrote up this little script, src/main/groovy/Validator.groovy:
import org.dom4j.io.SAXReader

def (schema, document) = args
def schemaStream = new File(schema).newInputStream()
def documentReader = new File(document).newReader()
SAXReader reader = new SAXReader()
setupSaxReader(reader, schemaStream)
reader.read(documentReader)
println "Document valid.\n"

def setupSaxReader(reader, stream) {
reader.setValidation(true)
reader.setFeature("http://apache.org/xml/features/validation/schema", true)
reader.setFeature("http://apache.org/xml/features/validation/schema-full-checking", true)
reader.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", "http://www.w3.org/2001/XMLSchema")
reader.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource", stream)
}

As you can see, the script takes two arguments, the first being a schema to validate against, and the second is a XML instance docuemnt. I found a test schema from another test project and copied it over to my project directory along with a instance document.

I can run the script by first compiling:

  $ mvn compile
And then using the exec plugin in offline mode (-o) to speed things up as I already have all the jars I need so there's no need for Maven to check:
  $ mvn -o exec:java -Dexec.mainClass=Validator -Dexec.args="order.xsd order.xml"
If the validation succeeds, I'm given the normal BUILD SUCCESSFUL output. If not, the error is presented on the screen, for example:
  Error on line 7 of document  : cvc-complex-type.2.4.b: The content of element 'customer' is not complete.
One of '{country}' is expected. Nested exception: cvc-complex-type.2.4.b:
The content of element 'customer' is not complete.
One of '{country}' is expected.
To finish up, remembering the mvn command is a little rough for me, so I wrapped it into a shell script:
  #!/bin/bash
mvn -o exec:java -Dexec.mainClass=Validator -Dexec.args="$1 $2"
Which I can call with:
  $ ./validate.sh order.xsd order.xml

This little project has proved pretty useful as I can edit the schema or XML instance and run it though rather quickly. I'm sure there are plenty of other methods out there.

1 comment: