Saturday, July 23, 2011

Groovy: Reducing Duplication with Closures

I've been using Groovy as my primary programming language for a few years now. One of the features that is commonly discussed and I've read about a lot are closures. While I've appreciated what I've read, I've had a difficult time recognizing good places to use them. Like I lot of things, if you "dont use it, you lose it".

One situation where I have been able to find uses for closures recently was using them instead of methods to reduce duplication that I can hopefully demonstrate using the following example scenario. We have a TextAnalyzer class whose sole responsibility is to analyze a string and report the words separated by commas and semicolons and the counts for said delimiting characters.

The TextAnalyzer.analyze() method below travels through a given string, aggregating words and counting commas and semicolons and returns an instance of TextAnalysis that contains the results.

For example, given the string:

    groovy,is,a,language;enjoyable

The result is:

    TextAnalysis(words:[groovy, is, a, language, enjoyable], commas:3, semicolons:1)

Here's the first version of the code. Of note here to me is the duplication that exists when either a comma or semicolon is found. In both clauses, the current word is pushed onto the words list and is reset in preparation for the next word.

    def text = "groovy,is,a,language;enjoyable"

new TextAnalyzer().analyze(text)

@groovy.transform.ToString(includeNames=true)
class TextAnalysis {
List words
int commas
int semicolons
}

class TextAnalyzer {

def analyze(text) {
def words = []
def word = ''
int commas = 0
int semicolons = 0

text.each { c ->
if (c == ',') {
commas++
words << word
word = ''
}
else if (c == ';') {
semicolons++
words << word
word = ''
}
else {
word += c
}
}

if (word) words << word

new TextAnalysis(words:words, commas:commas, semicolons:semicolons)
}
}

One option to remove the duplication is to add another method that would handle the push and reset of the current word, named add below:

    class TextAnalyzer {

private add(words, word) {
words << word
''
}

def analyze(text) {
def words = []
def word = ''
int commas = 0
int semicolons = 0

text.each { c ->
if (c == ',') {
commas++
word = add(words, word)
}
else if (c == ';') {
semicolons++
word = add(words, word)
}
else {
word += c
}
}

if (word) words << word

new TextAnalysis(words:words, commas:commas, semicolons:semicolons)
}

}

This works and produces the same result, however a couple of things bother me about it. First, the method is doing two things, it's adding the current word to the list, and return a value to reset the word. The user is required to know that it needs to assign the return value to the current word.

We could also used a boolean to indicate a word was complete that could of been evaluated at the end of each character evaluation, but that would add a bit more complexity to the analyze method through additional branching.

Another approach that sits better with me is to use a closure that's internal to the analyze method. Let's call it completeWord:

class TextAnalyzer {

def analyze(text) {
def words = []
def word = ''
int commas = 0
int semicolons = 0

def completeWord = { ->
words << word
word = ''
}

text.each { c ->
if (c == ',') {
commas++
completeWord()
}
else if (c == ';') {
semicolons++
completeWord()
}
else {
word += c
}
}

if (word) words << word

new TextAnalysis(words:words, commas:commas, semicolons:semicolons)
}

}

There are a few reasons I like this better besides the reduced duplication. Since the closure is defined within the method, it has access to the variables holding the current word and the list of words, so there's no need to pass the variables around. Also, since there are no other uses for either the previous method or the new closure, the code reads better having the logic to complete a word inside the method itself.

Of course, a really concise version of the analyze method probably would not iterate over the characters in the string at all (I was thinking I could of simply started with this version, but found the iteration helped illustrate things a bit more):

    class TextAnalyzer {
def analyze(text) {
new TextAnalysis(
words: text.split(',|;'),
commas: text.count(','),
semicolons: text.count(';')
)
}
}

The above though still has some duplication in the call to String.count(). We remove it using a closure that counts:

    class TextAnalyzer {
def analyze(text) {

def count = { c -> text.count(c) }

new TextAnalysis(
words: text.split(',|;'),
commas: count(','),
semicolons: count(';')
)
}
}

While we have not seen any huge wins here, in a more complex scenario, defining closures within methods could help make our code more readable and easier to maintain, which is always a win.

No comments:

Post a Comment