Saturday, July 23, 2011

Groovy: Reducing Duplication with Closures

I've been using Groovy as my primary programming language for a few years now. One of the features that is commonly discussed and I've read about a lot are closures. While I've appreciated what I've read, I've had a difficult time recognizing good places to use them. Like I lot of things, if you "dont use it, you lose it".

One situation where I have been able to find uses for closures recently was using them instead of methods to reduce duplication that I can hopefully demonstrate using the following example scenario. We have a TextAnalyzer class whose sole responsibility is to analyze a string and report the words separated by commas and semicolons and the counts for said delimiting characters.

The TextAnalyzer.analyze() method below travels through a given string, aggregating words and counting commas and semicolons and returns an instance of TextAnalysis that contains the results.

For example, given the string:

    groovy,is,a,language;enjoyable

The result is:

    TextAnalysis(words:[groovy, is, a, language, enjoyable], commas:3, semicolons:1)

Here's the first version of the code. Of note here to me is the duplication that exists when either a comma or semicolon is found. In both clauses, the current word is pushed onto the words list and is reset in preparation for the next word.

    def text = "groovy,is,a,language;enjoyable"

new TextAnalyzer().analyze(text)

@groovy.transform.ToString(includeNames=true)
class TextAnalysis {
List words
int commas
int semicolons
}

class TextAnalyzer {

def analyze(text) {
def words = []
def word = ''
int commas = 0
int semicolons = 0

text.each { c ->
if (c == ',') {
commas++
words << word
word = ''
}
else if (c == ';') {
semicolons++
words << word
word = ''
}
else {
word += c
}
}

if (word) words << word

new TextAnalysis(words:words, commas:commas, semicolons:semicolons)
}
}

One option to remove the duplication is to add another method that would handle the push and reset of the current word, named add below:

    class TextAnalyzer {

private add(words, word) {
words << word
''
}

def analyze(text) {
def words = []
def word = ''
int commas = 0
int semicolons = 0

text.each { c ->
if (c == ',') {
commas++
word = add(words, word)
}
else if (c == ';') {
semicolons++
word = add(words, word)
}
else {
word += c
}
}

if (word) words << word

new TextAnalysis(words:words, commas:commas, semicolons:semicolons)
}

}

This works and produces the same result, however a couple of things bother me about it. First, the method is doing two things, it's adding the current word to the list, and return a value to reset the word. The user is required to know that it needs to assign the return value to the current word.

We could also used a boolean to indicate a word was complete that could of been evaluated at the end of each character evaluation, but that would add a bit more complexity to the analyze method through additional branching.

Another approach that sits better with me is to use a closure that's internal to the analyze method. Let's call it completeWord:

class TextAnalyzer {

def analyze(text) {
def words = []
def word = ''
int commas = 0
int semicolons = 0

def completeWord = { ->
words << word
word = ''
}

text.each { c ->
if (c == ',') {
commas++
completeWord()
}
else if (c == ';') {
semicolons++
completeWord()
}
else {
word += c
}
}

if (word) words << word

new TextAnalysis(words:words, commas:commas, semicolons:semicolons)
}

}

There are a few reasons I like this better besides the reduced duplication. Since the closure is defined within the method, it has access to the variables holding the current word and the list of words, so there's no need to pass the variables around. Also, since there are no other uses for either the previous method or the new closure, the code reads better having the logic to complete a word inside the method itself.

Of course, a really concise version of the analyze method probably would not iterate over the characters in the string at all (I was thinking I could of simply started with this version, but found the iteration helped illustrate things a bit more):

    class TextAnalyzer {
def analyze(text) {
new TextAnalysis(
words: text.split(',|;'),
commas: text.count(','),
semicolons: text.count(';')
)
}
}

The above though still has some duplication in the call to String.count(). We remove it using a closure that counts:

    class TextAnalyzer {
def analyze(text) {

def count = { c -> text.count(c) }

new TextAnalysis(
words: text.split(',|;'),
commas: count(','),
semicolons: count(';')
)
}
}

While we have not seen any huge wins here, in a more complex scenario, defining closures within methods could help make our code more readable and easier to maintain, which is always a win.

Tuesday, July 5, 2011

Spock: Varying Results from a Mocked Method

In the Groovy world, I'm a big fan of the Spock specification framework and have written about it before. One little challenge I ran into was to setup an interaction for a mock to return a different result depending on how many times its been called. To illustrate what I'm talking about, I'll set things up with the following little scenario.

Let's say we have a class, RecordProcessor whose job is to read records from a source until there are no more records to be read. A collaborator of the class, RecordReader, is responsible for the actual record reading, and is the class whose behavior we want to mock.

The method to be called on the RecordReader takes as a parameter, the number of records to read and reports whether or not there are more records left in the source to be read. For simplicities sake, let's assume the responsibility of the RecordProcessor is to simply invoke the RecordReader until it reports that there are no more records to read.

class RecordProcessor {

RecordReader recordReader
int threshold

def process() {
boolean isProcessingComplete = true
boolean recordsRemaining = true

while(recordsRemaining) {
recordsRemaining = recordReader.readRecords(threshold)
}

// would do more processing evaluation here...
processingComplete
}
}

We want our unit test to determine whether or not the class under test can detect when there are no more records to read and report that processing was completed. Here is the base of the Spock specification where the mocking setup is left to a method:

class RecordProcessorSpec extends Specification {

def threshold = 5
RecordProcessor recordProcessor = new RecordProcessor(threshold: threshold)

def "can process all the records accessible by a record reader"() {
given:
recordProcessorHasMockRecordReader()
when:
isProcessingComplete = recordProcessor.process()
then:
isProcessingComplete
}
}

The wiki page describing mocking and interactions can be found My first pass at setting up the mock looked like the below, where the first two calls to readRecords method would return true and the third would return false (for more on how mocking is done with Spock, see Interactions).

    def recordProcessorHasMockRecordReader() {
def mock = Mock(RecordReader)
2 * mock.readRecods(threshold) >> true
1 * mock.readRecods(threshold) >> false
recordProcessor.recordReader = mock
}

This didn't do the trick however as 3 invocations were detected for the first definition of how the method should behave:

  Too many invocations for:

2 * mock.readFrom(threshold) >> true (3 invocations)

I found one answer in using a closure to define the return value of the method invocation that had a bound variable to tracking the number of reads that were asked for:

    def recordProcessorHasMockRecordReader() {
def mock = Mock(RecordReader)
def readsAttempted = 0
3 * mock.readRecods(threshold) >> {
readsAttempted++
readsAttepted < 3 ? true : false
}
recordProcessor.recordReader = mock
}

With each invocation of the readRecords method by the RecordProcessor, the readsAttempted variable defined outside of the closure is incremented. With the first two calls, the RecordProcessor is told there are more records to process, and with the third call, it is told that all the records have been read.

Are there any other approaches anyone has found to approach a similar problem? Thanks.