Sunday, August 23, 2009

Groovy: Preserving Whitespace when Parsing XML

Have you ever had a case where whitespace for element values in XML have meaning? I have had the pleasure of working on application where a single space between a starting and ending element tag has. Of course, coming onto the project, I didn't realize that was the case at first, so it took a while for me to realize the spaces weren't being preserved inside the application.

I was parsing some XML using Groovy's XmlSlurper, which by default, doesn't retain such spaces, which is by no means a complaint on my end because this case seems like an exception rather than the norm.

Consider this example:

def text = "<root><a>1</a><b> </b><c>2</c></root>"
def xml = new groovy.util.XmlSlurper().parseText(text)
println xml
The output of this script is simply '12', which is what I would imagine most people would expect. With my case however, I would like it to be '1 2'. I can use the keepWhitespace property of XmlSlurper:
def text = "<root><a>1</a><b> </b><c>2</c></root>"
def xml = new groovy.util.XmlSlurper(keepWhitespace:true).parseText(text)
println xml

Producing '1 2'. Hooray!

I can do something similar with XmlParser using its trimWhitespace property:

def text = "<root><a>1</a><b> </b><c>2</c></root>"
def xml = new groovy.util.XmlParser(trimWhitespace:false).parseText(text)
println xml
assert xml.b.text().equals(' ')

Producing:

  root[attributes={}; value=[a[attributes={}; value=[1]], b[attributes={}; value=[ ]], c[attributes={}; value=[2]]]]

And no assertion errors.

Playing with Scala and Maven

Recently, I've taken up an interest in Scala and wanted to try it out with Maven. Luckily, people have already blazed this path for us by developing a Scala plugin for Maven. In this post, writing as I go, I'm going to play with what we can do with this plugin to get started.

I began looking for a Scala project archetype and wanted to generate an instace of it to get started.

To begin, I added the following repositores to my Maven Settings via ~/.m2/settings.xml:

<settings>
<activeProfiles>
<activeProfile>repos</activeProfile>
</activeProfiles>
<profiles>
<profile>
<id>repos</id>
<repositories>
<repository>
<id>scala-tools.org</id>
<name>Scala-tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
</profile>
</profiles>
</settings>
I then ran:
  $ mvn archetype-generate
And was presented with a Scala project:
  35: internal -> scala-archetype-simple (A simple scala project)
Using the option, I answered the usual set of questions and a sample project was created:
Define value for groupId: : prystasj.scala
Define value for artifactId: : maven-test
Define value for version: 1.0-SNAPSHOT: :
Define value for package: prystasj.scala: :
Confirm properties configuration:
groupId: prystasj.scala
artifactId: maven-test
version: 1.0-SNAPSHOT
package: prystasj.scala
Y: :
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating OldArchetype: scala-archetype-simple:1.2
[INFO] ----------------------------------------------------------------------------
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
The resulting project structure:
  ./pom.xml
./src/test/scala/prystasj/scala/AppTest.scala
./src/test/scala/prystasj/scala/MySpec.scala
./src/main/scala/prystasj/scala/App.scala
The resulting POM file added the repositores I defined in my settings file earlier so I removed them. In addition, the Maven Eclipse Plugin was present in the list of plugins used in the build section, so I also removed that (not because I don't like Eclipse, but rather I want a simpler POM to play with here), leaving me with:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>prystasj.scala</groupId>
<artifactId>maven-test</artifactId>
<version>1.0-SNAPSHOT</version>
<inceptionYear>2009</inceptionYear>
<properties>
<scala.version>2.7.0</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.specs</groupId>
<artifactId>specs</artifactId>
<version>1.2.5</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
</plugins>
</reporting>
</project>

Without touching anything, else I'm going to package her up, here I go:

  $ mvn package
...
[INFO] [scala:testCompile {execution: default}]
[INFO] Compiling 2 source files to /home/prystasj/workspace/prystasj/scala/maven-test/target/test-classes
[INFO] use java command with args in file forced : false
[INFO] [surefire:test]

------------------------------------------------------
T E S T S
-------------------------------------------------------
Running prystasj.scala.MySpecTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.093 sec
Running prystasj.scala.AppTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.044 sec

Results :

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0

[INFO] [jar:jar]
[INFO] Building jar: /home/prystasj/workspace/prystasj/scala/maven-test/target/maven-test-1.0-SNAPSHOT.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------

After Maven downloaded what was needed, the source compiled and the tests run, a JAR was built in the target directory.

I was curious about the configuration of the Scala plugin, namely:

  <configuration>
<scalaVersion>${scala.version}</scalaVersion>
<args>
<arg>-target:jvm-1.5</arg>
</args>
</configuration>

The target argument is a parameter passed to scalac to determine which target JVM the source files should be built for. Since I'm using Java 1.6, I could update the configuration, or give completely removing it a shot. The usage examples on the plugin web page do not include its use, so I opted to remove it to see what happened. After doing so, I rebuilt my project and everything ran just fine, although I probably should investigate more and compare the results more deeply.

Turning to the dependencies defined in the POM, Specs is a behavior-driven-design framework along the lines of others like EasyB. It also pulls in several other dependencies used for testing:

  $ mvn dependency-tree
[INFO] [dependency:tree]
[INFO] prystasj.scala:maven-test:jar:1.0-SNAPSHOT
[INFO] +- org.scala-lang:scala-library:jar:2.7.0:compile
[INFO] +- junit:junit:jar:4.4:test
[INFO] \- org.specs:specs:jar:1.2.5:test
[INFO] +- org.scalatest:scalatest:jar:0.9.1:test
[INFO] +- org.scalacheck:scalacheck:jar:1.2:test
[INFO] \- org.jmock:jmock:jar:2.4.0:test
[INFO] +- org.hamcrest:hamcrest-core:jar:1.1:test
[INFO] \- org.hamcrest:hamcrest-library:jar:1.1:test

Since I'm not going to go that route with my first test project (as I haven't had the change to play with Specs yet), I replaced the dependency with dependencies on scalatest and scalacheck:

    <dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest</artifactId>
<version>0.9.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalacheck</groupId>
<artifactId>scalacheck</artifactId>
<version>1.2</version>
<scope>test</scope>
</dependency>
The resulting dependency tree is as follows:
  [INFO] [dependency:tree]
[INFO] prystasj.scala:maven-test:jar:1.0-SNAPSHOT
[INFO] +- org.scala-lang:scala-library:jar:2.7.0:compile
[INFO] +- junit:junit:jar:4.4:test
[INFO] +- org.scalatest:scalatest:jar:0.9.1:test
[INFO] \- org.scalacheck:scalacheck:jar:1.2:test

Since src/test/scala/prystasj/scala/MySpec.scala uses the Spec library, I also removed the test class from the project.

Turning my attention to what's left, App.scala is a simple Singleton Object:

package prystasj.scala

object App extends Application {
println( "Hello World!" )
}

A singleton object of course has one and only instance. If we had a class named App, the singleton object would be called a companion to that class, but here, it's just a standalone object. A singleton object can be seen as Scala's answer to static members, as Scala cannot have them, making Scala more object-oriented.

The singleton object uses the Application trait, which removes the need to write the Hello World example in a way that may more familar to Java programmers:

object App {
def main(args: Array[String]) {
println("Hello World!")
}
}
But this entry is supposed to be about Scala & Maven, so I might be getting off track.

Back to using Maven, let's see if we can run the App class from the command line with the Exec Plugin:

$ mvn package exec:java -Dexec.mainClass="prystasj.scala.App"
[INFO] Scanning for projects...
..
[INFO] [exec:java]
Hello World!
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------

...And we see the greeting in the output.

What about site generation? The POM currently doesn't have a location defined to distribute the web site too, but let's place it in the directory we're working in by adding a distrubution management section to the POM:

  <distributionManagement>
<site>
<url>file://${user.dir}/site</url>
</site>
</distributionManagement>

...And let's generate a web site for the project:

  $ mvn site-deploy

[INFO] Generating "ScalaDocs" report.
...
[INFO] [site:deploy]
file:///home/prystasj/workspace/scala/maven-test/site - Session: Opened
file:///home/prystasj/workspace/scala/maven-test/site - Session: Disconnecting
file:///home/prystasj/workspace/scala/maven-test/site - Session: Disconnected
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
From the newly created website, a ScalaDocs report was created, here's a small sample:

  Object Summary
    object App extends scala.Application


Ah, that's a lot of writing for this sitting (for me at least). I'll leave with the POM I have so far for reference:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>prystasj.scala</groupId>
<artifactId>maven-test</artifactId>
<version>1.0-SNAPSHOT</version>
<inceptionYear>2008</inceptionYear>
<properties>
<scala.version>2.7.0</scala.version>
<scala.plugin.version>2.11</scala.plugin.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest</artifactId>
<version>0.9.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalacheck</groupId>
<artifactId>scalacheck</artifactId>
<version>1.2</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>${scala.plugin.version}</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>${scala.plugin.version}</version>
<configuration>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
</plugin>
</plugins>
</reporting>
<distributionManagement>
<site>
<url>file://${user.dir}/site</url>
</site>
</distributionManagement>
</project>

Resources:

Inheriting Tests with an Abstract Test Case

I was wondering what might be a good way to write unit tests for defined methods in an abstract class. I then tried out using an abstract test class after perusing the JUnit FAQ.

An abstract test class can be used to help ensure that the subclasses of the abstract class behave as expected.

I decided to take a different approach then what the article linked from the FAQ took. With an abstract source class, I thought I would create an abstract class with concrete tests used to test the non-abstract methods defined in the abstract source class.

The test classes for extensions of the abstract test class would then inherit the tests for the methods defined in the abstract source class. If extensions of that class could override the implementation of those methods, but the tests would be there to ensure the overriding methods produce the same result as the overridden method.

Take the following example (which is contrived for this example, and in no way endorses any OO principles). An abstract class Detabber has one concrete method that replaces all tabs with spaces for whatever it is given:

abstract class Detabber {
abstract def detabEntity(entity)
def detab(text) { text.replaceAll("\t", " ") }
}

All implementations are charged with providing what is to be de-tabbed, here are two:

class FileDetabber extends Detabber {
@Override def detabEntity(entity) {
def result
new File(entity).getText().eachLine { line -> result += detab(line) }
result
}
}

class ResourceDetabber extends Detabber {
@Override def detabEntity(entity) {
def text = getClass().getClassLoader().getResourceAsStream(entity).getText()
detab(text)
}
}

Next, I write my abstract test class for the abstract class Detabber:

abstract class DetabberTest {
Detabber detabber
@Test void test_detab() {
assertEquals "string was detabbed", "a b", detabber.detab("a\tb")
}
}

In the test phase, there should be no tests to be run currently:

There are no tests to run.

Results :
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0

Here I add a couple extensions of the above test class, but add no new tests. Each class should inherit the defined test from the extended class:

// tests for the implementation of detabEntity(entity) omitted for better illustration
class FileDetabberTest extends DetabberTest {
@Before void setUp() { detabber = new FileDetabber() }
}

class ResourceDetabberTest extends DetabberTest {
@Before void setUp() { detabber = new ResourceDetabber() }
}

Now two tests should be run:

Running ResourceDetabberTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.271 sec
Running FileDetabberTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.026 sec

Results :
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0

It might look like the inherited test add no value, besides upping my the number of tests I can say run in a report. I could just define the same test, in either FileDetabber or ResourceDetabber and be done with it, covering the method once since it never changes.

For arguments sake, let's say a new Detabber comes along that, maybe for efficiency's sake, wants to override Detabber.detab(text):

class FasterResourceDetabber extends ResourceDetabber {
@Override def detab(text) {
def matcher = text =~ "\t"
matcher.replaceAll(" ") // actually faster? I don't know
}
}

It's test class can still inherit the test from DetabberTest and avoid having to have its own test (or tests if more were present in DetabberTest) without having to add a new code:

class FasterResourceDetabberTest extends DetabberTest {
@Before void setUp() { detabber = new FasterResourceDetabber() }
}

Three tests are now run:

Running FasterResourceDetabberTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.35 sec
Running ResourceDetabberTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.042 sec
Running FileDetabberTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.016 sec

Results :
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0

Here we verify the new implementation of detab(text) behaves as one might expect. The same input functionally produces the same output as before with the new implementation.

Earlier, I omitted tests for the implementations of detabEntity(entity) in FileDetabber and ResourceDetabber, so I'll add them here with all the written code as a summary:

Source:

abstract class Detabber {
abstract def detabEntity(entity)

def detab(text) {
text.replaceAll("\t", " ")
}
}

class FileDetabber extends Detabber {
@Override def detabEntity(entity) {
def result = ""
new File(entity).getText().eachLine { line ->
result += detab(line)
}
result
}
}

class ResourceDetabber extends Detabber {
@Override def detabEntity(entity) {
def text = getClass().getClassLoader().getResourceAsStream(entity).getText()
detab(text)
}
}

class FasterResourceDetabber extends ResourceDetabber {
@Override def detab(text) {
def matcher = text =~ "\t"
matcher.replaceAll(" ")
}
}

Test Source:

import org.junit.Before
import org.junit.Test
import static org.junit.Assert.*

abstract class DetabberTest {
Detabber detabber
@Test void test_detab() {
assertEquals "string was detabbed", "a b", detabber.detab("a\tb")
}
}

class FileDetabberTest extends DetabberTest {

@Before void setUp() {
detabber = new FileDetabber()
}

@Test void test_detabbing_of_a_small_file() {
def resource = "short-file.txt" // a one-line text file
def filePath = getClass().getClassLoader().getResource(resource).getPath()
def expected = "hello goodbye"
assertEquals "file was detabbed", expected, detabber.detabEntity(filePath)
}
}

class ResourceDetabberTest extends DetabberTest {

@Before void setUp() {
detabber = new ResourceDetabber()
}

@Test void test_detabbing_of_a_small_resource() {
def resource = "short-file.txt"
def expected = "hello goodbye\n"
assertEquals "test resource was detabbed", expected, detabber.detabEntity(resource)
}
}

class FasterResourceDetabberTest extends DetabberTest {
@Before void setUp() {
detabber = new FasterResourceDetabber()
}
}

Results:

-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running FasterResourceDetabberTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.409 sec
Running ResourceDetabberTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.11 sec
Running FileDetabberTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.186 sec

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

Hopefully, this could bring up some interesting discussion on the pros and cons to this and any other approaches.

Thursday, August 20, 2009

Downloading an entire site using wget

To recursive download a website for offline viewing with wget, you can use something similar to:
$ wget --recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains somesite.org \
--no-parent \
www.somesite.org

I've had a note email in my inbox for this for long enough, so up here it goes so I can delete it :)