Saturday, March 10, 2012

Groovy: Regular Expressions and Multiple Lines

Regular expressions are sometimes used on text that spans multiple lines, which Groovy has support for. I had to search for how to enable multi-line searching, so I thought it would be good to post here.

To allow the expression to span multiple lines we can add (?ms) to the beginning of the expression.

Take the following example, where want to grab the entire book element from a single-line XML document:

    def xml = "<library><book><title>Effective Java</title><author>Bloch</author></book></library>"
def matcher = xml =~ /<book>.*<\/book>/
matcher.size() > 0 ? matcher[0] : "NOTHING"

This gives us an entire book:

    <book><title>Effective Java<title><author>Bloch<author><book>

If the XML was to span multiple lines however, we end up with NOTHING:

    def xml = """<library>
<book>
<title>Effective Java</title>
<author>Bloch</author>
</book>
</library>"""
def matcher = xml =~ /<book>.*<\/book>/
matcher.size() > 0 ? matcher[0] : "NOTHING"

Now if we add the (?ms), we will still get an entire book entry:

    def xml = """<library>
<book>
<title>Effective Java</title>
<author>Bloch</author>
</book>
</library>"""
def matcher = xml =~ /(?ms)<book>.*<\/book>/
matcher.size() > 0 ? matcher[0] : "NOTHING"

The above results in:

  <book>
<title>Effective Java<title>
<author>Bloch<author>
<book>

If you have any other tips or another way to accomplish the same thing, please feel free to leave a comment.

Injecting a Hostname with Spring

Sometimes our applications are required to know the name of the host or server they are running on. Usually, the hostname can be attained using plain old Java, however, in this post, I'll demonstrate a way to inject the name of the host an application is running on using Spring. This may sound like a case of over-engineering, but the main advantage of doing so is we can save our code from knowing how to obtain the hostname and avoid having to catch any exceptions related to retrieving the hostname.

As a use case, I'll recall a project I was working on that was tasked with consuming a service that required a client identifier be passed along in the request. The requirement stated that the ID be unique for every request and contain the hostname of the caller for tracking purposes inside the service. The form of the ID would be:

    <hostname>:<request_number>

Yes, this requirement could be considered a case of internal requirement of the service being called leaking into the implementation of the caller, but this a requirement we could not work around. To start, it made sense to task the creation of the identifier to a separate object. To fulfull the requirement that the ID be unique, we could use a UUID. The ugly part is deriving the hostname. Here is a first pass at the new class:

    public class RequestIdentifier {

public String requestId() {
return new StringBuilder()
.append(hostname())
.append(':')
.append(uuid())
.toString();
}

private String hostname() {
try {
return InetAddress.getLocalHost().getHostName();
} catch (UnknownHostException uhe) {
return defaultHostName;
}
}

private String uuid() {
return UUID.randomUUID().toString();
}

private static final String defaultHostName = "myHost";
}

Here we're forced to catch an UnknownHostException. As we still want the request to be made in the unlikely event that the exception is thrown, we default to a hostname (which would probably indicate the calling application) so that processing can proceed. We could improve on this implementation by caching the result of the first hostname lookup, but the potential for the exception to be thrown at least once would still exist.

Having to catch and deal with the exception places a burden on the class and adds complexity. Additionally, to fully cover the class during unit testing, we will have to find a way to simulate the exception, which could be difficult since the call InetAddress.getLocalHost() is static.

It might be better if the class was given its hostname, simplifying the code greatly, and removing the need for the default:

    public class RequestIdentifier {

private String hostname; // setter omitted for brevity

public String requestId() {
return new StringBuilder()
.append(hostname)
.append(':')
.append(uuid())
.toString();
}

private String uuid() {
return UUID.randomUUID().toString();
}
}

To make this situation usable, we can inject the hostname into the class:

    <beans xmlns="http://www.springframework.org/schema/beans"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd">

<bean id="requestIdentifier" class="org.prystasj.service.validation.RequestIdentifier">
<property name="hostname" ref="hostname"/>
</bean>

<bean id="hostname" factory-bean="localhost" factory-method="getHostName"/>

<bean id="localhost" class="java.net.InetAddress" factory-method="getLocalHost"/>
</beans>

Now if there is an issue retrieving the hostname, we will know when the application starts up, before any processing is requested, and we have a cleaner, more usable class for creating the request identifier.

Monday, March 5, 2012

Subversion: Setting the MIME Type Property

As I'm tired of having to look this up every now and then when I need it, here's an example for how to add the MIME type for a file in Subversion:

 $ svn propset svn:mime-type "text/html" src/site/resources/design.html

The above command will cause the HTML file to be loaded as HTML in a browser when it is linked to in Subversion in the event your browser would like to display it as plain text.