[onjava] The Hidden Gems of Jakarta Commons

2005. 2. 12. 14:35

The Hidden Gems of Jakarta Commons, Part 1

by Timothy M. O'Brien

12/22/2004

If you are not familiar with the href="http://jakarta.apache.org/commons">Jakarta Commons, you have
likely reinvented a few wheels. Before you write any more generic
frameworks or utilities, grok the Commons. It will save you serious
time. Too many people write a StringUtils class that
duplicates methods available in href="http://jakarta.apache.org/commons/lang">Commons Lang's
StringUtils, or developers unknowingly recreate the
utilities in href="http://jakarta.apache.org/commons/collections">Commons
Collections even though commons-collections.jar is
already available in the classpath. Seriously, take a break. Check
out the Commons Collections API and then go back to your task; I
promise you'll find something simple that will save you a week over
the next year. If people just took some time to look at Jakarta
Commons, we would have much less code duplication--we'd start making
good on the real promise of reuse. I've seen it happen; somebody digs
into Commons BeanUtils or Commons Collections and invariably they have
a "Oh, if I had only known about this, I wouldn't have written 10,000
lines of code" moment. There are still parts of Jakarta Commons that
remain a mystery to most; for instance, many have yet to hear of href="http://jakarta.apache.org/commons/cli">Commons CLI or href="http://jakarta.apache.org/commons/configuration">Commons
Configuration, and most have yet to notice the valuable
functors package in Commons Collections. In this series,
I emphasize some of the less-appreciated tools and utilities in the
Jakarta Commons.

In this first part of the series, I explore XML rule set definitions in
the Commons
Digester, functors available in Commons Collections, and an
interesting application, href="http://jakarta.apache.org/commons/jxpath">Commons JXPath, to
query a List of objects. Jakarta Commons
contains utilities that aim to help you solve problems at the lowest
level of programming: iterating over collections, parsing XML, and
selecting objects from a List. I would encourage you to
spend some time focusing on these small utilities, as learning about
the Jakarta Commons will save you a substantial amount of time. It
isn't simply about using Commons Digester to parse XML or using
CollectionUtils to filter a collection with a
Predicate. You will start to see benefits once you
realize how to combine the power of these utilities and how to relate
Commons projects to your own applications; once this happens, you will
come to see commons-lang.jar,
commons-beanutils.jar, and
commons-digester.jar as just as indispensable to any system as
the JVM itself.

1. XML-Based Rule Sets for Commons Digester

Commons Digester
1.6 provides one of the easiest ways to turn XML into objects.
Digester has already been introduced on the O'Reilly network in two
articles: "Learning
and Using Jakarta Digester," by Philipp K. Janert, and " href="http://www.oreillynet.com/pub/a/onjava/2003/07/09/commons.html"/>Using the Jakarta
Commons, Part 2," by Vikram Goyal. Both articles demonstrate the use of
XML rule sets, but this idea of defining rule sets in XML has not
caught on. Most sightings of the Digester appear to define rule sets
programmatically, in compiled code. You should avoid hard-coding
Digester rule sets in compiled Java code when you have the opportunity
to store such mapping information in an external file or a classpath
resource. Externalizing a Digester rule set makes it easier to adapt to an
evolving XML document structure or an evolving object model.

To demonstrate the difference between defining rule sets in XML and
defining rule sets in compiled code, consider a system to parse XML to
a Person bean with three properties--id,
name, and age, as defined in the following class:

package org.test;



public class Person {

  public String id;

  public String name;

  public int age;

		

  public Person() {}



  public String getId() { return id; }

  public void setId(String id) { 

    this.id = id;

  }



  public String getName() { return name; }

  public void setName(String name) {

    this.name = name;

  }



  public int getAge() { return age; }

  public void setAge(int age) {

    this.age = age;

  }

}

Assume that your application needs to parse an XML file containing
multiple person elements. The following XML file,
data.xml, contains two person elements
that you would like to parse into Person objects:

<people>

  <person id="1">

    <name>Tom Higgins</name>

    <age>25</age>

  </person>

  <person id="2">

    <name>Barney Smith</name>

    <age>75</age>

  </person>

  <person id="3">

    <name>Susan Shields</name>

    <age>53</age>

  </person>

</people>

You expect the structure and content of this XML file to change over
the next few months, and you would prefer not to hard-code the
structure of the XML document in compiled Java code. To do this, you
need to define Digester rules in an XML file that is loaded as a
resource from the classpath. The following XML document,
person-rules.xml, maps the person element to
the Person bean:

<digester-rules>

  <pattern value="people/person">

    <object-create-rule classname="org.test.Person"/>

    <set-next-rule methodname="add" 

                      paramtype="java.lang.Object"/>

    <set-properties-rule/>

    <bean-property-setter-rule pattern="name"/>

    <bean-property-setter-rule pattern="age"/>

  </pattern>

</digester-rules>

All this does is instruct the Digester to create a new instance of
Person every time it encounters a person
element, call add() to add this Person to an
ArrayList, set any bean properties that match attributes
on the person element, and set the name and
age properties from the sub-elements name
and age. You've seen the Person class, the
XML document to be parsed, and the Digester rule definitions in XML
form. Now you need to create an instance of Digester with
the rules defined in person-rules.xml. The following
code creates a Digester by passing the URL
of the person-rules.xml resource to the
DigesterLoader. Since the person-rules.xml
file is a classpath resource in the same package as the class parsing
the XML, the URL is obtained with a call to
getClass().getResource(). The
DigesterLoader then parses the rule definitions and adds
these rules to the newly created Digester:

import org.apache.commons.digester.Digester;

import org.apache.commons.digester.xmlrules.DigesterLoader;



// Configure Digester from XML ruleset

URL rules = getClass().getResource("./person-rules.xml");

Digester digester = 

    DigesterLoader.createDigester(rules);



// Push empty List onto Digester's Stack

List people = new ArrayList();

digester.push( people );



// Parse the XML document

InputStream input = new FileInputStream( "data.xml" );

digester.parse( input );

Once the Digester has parsed the XML in
data.xml, three Person objects should be in
the people ArrayList.

The alternative to defining Digester rules in XML is to add them using
the convenience methods on a Digester instance. Most
articles and examples start with this method, adding rules using the
addObjectCreate() and
addBeanPropertySetter() methods on Digester.
The following code adds the same rules that were defined in
person-rules.xml:

digester.addObjectCreate("people/person", 

                         Person.class);

digester.addSetNext("people/person", 

                    "add", 

                    "java.lang.Object");

digester.addBeanPropertySetter("people/person", 

                               "name");

digester.addBeanPropertySetter("people/person", 

                               "age");

If you have ever found yourself working at an organization with
2500-line classes to parse a huge XML document with SAX, or a whole
collection of classes to work with DOM or JDOM, you understand that
XML parsing is more complex than it needs to be, in the majority of
cases. If you are building a highly efficient system with strict
speed and memory requirements, you need the speed of a SAX parser. If
you need the complexity of the DOM Level 3, use a parser like href="http://xml.apache.org/#xerces">Apache Xerces. But if you
are simply trying to parse a few XML documents into objects, take a
look at Commons Digester, and define your rule set in an XML file.

Any time you can move this type of configuration outside of compiled
code, you should. I would encourage you to define your digester rules
in an XML file loaded either from the file system or the classpath.
Doing so will make it easier to adapt your program to changes in the
XML document and changes in your object model. For more information
on defining Digester rules in an XML file, see Section 6.2 of the href="http://www.oreilly.com/catalog/jakartackbk">Jakarta Commons Cookbook, "Turning
XML Documents into Objects."

2. Functors in Commons Collections

Functors are an interesting part of Commons Collections 3.1 for two
reasons: they haven't received the attention they warrant, and they
have the potential to change the way you approach programming.
Functor is just a fancy name for an object that
encapsulates a function--a "functional object." And
while they are certainly not the same thing, if you have ever used
method pointers in C or C++, you'll understand the power of functors.
A functor is an object--a Predicate, a
Closure, or a
Transformer. Predicates evaluate objects and
return a boolean, Transformers evaluate objects and
return new objects, and Closures accept objects and
execute code. Functors can be combined into composite functors that
model loops, logical expressions, and control structures, and functors
can also be used to filter and operate upon items in a collection.

Explaining functors in an article as short as this may be impossible,
so to "jump start" your introduction to functors, I will solve the same problem both with and without functors.
In this example, Student objects from an
ArrayList are sorted into two List instances
if they meet certain criteria; students with straight-A grades are
added to an honorRollStudents list, and students with Ds
and Fs are added to a problemStudents list. After the
students are separated, the system will iterate through each list,
giving the honor-roll students an award and scheduling a meeting with
parents of problem students. The following code implements this
process without the use of functors:

List allStudents = getAllStudents();



// Create 2 ArrayLists to hold honorRoll students

// and problem students

List honorRollStudents = new ArrayList();

List problemStudents = new ArrayList();



// Iterate through all students.  Put the

// honorRoll students in one List and the

// problem students in another.

Iterator allStudentsIter = allStudents.iterator();

while( allStudentsIter.hasNext() ) {

  Student s = (Student) allStudentsIter.next();



  if( s.getGrade().equals( "A" ) ) {

    honorRollStudents.add( s );

  } else if( s.getGrade().equals( "B" ) && 

             s.getAttendance() == PERFECT) {

    honorRollStudents.add( s );

  } else if( s.getGrade().equals( "D" ) || 

             s.getGrade().equals( "F" ) ) {

    problemStudents.add( s );

  } else if( s.getStatus() == SUSPENDED ) {

    problemStudents.add( s );

  }

}



// For all honorRoll students, add an award and

// save to the Database.

Iterator honorRollIter = 

    honorRollStudents.iterator();

while( honorRollIter.hasNext() ) {

  Student s = (Student) honorRollIter.next();

   

  // Add an award to student record

  s.addAward( "honor roll", 2005 );

  Database.saveStudent( s );

}



// For all problem students, add a note and 

// save to the database.

Iterator problemIter = problemStudents.iterator();

while( problemIter.hasNext() ) {

  Student s = (Student) problemIter.next();



  // Flag student for special attention

  s.addNote( "talk to student", 2005 );

  s.addNote( "meeting with parents", 2005 );

  Database.saveStudent( s );

}

The previous example is very procedural; the only way to figure out
what happens to a Student object is to step through each
line of code. The first half of this example is decision logic that
applies tests to each Student object and classifies
students based on performance and attendance. The second half of this
example operates on the Student objects and saves the result to the
database. A 50-line method body like the previous example is how most
systems begin--manageable procedural complexity. But problems start
to appear when the requirements start to shift. As soon as that
decision logic changes, you will need to start adding more clauses to
the logical expressions in the first half of the previous example.
For example, what happens to your logical expression if a student is
classified as a problem if he has a B and perfect attendance, but
attended detention more than five times? Or what happens to the
second half, when a student can be on the honor roll only if they were
not a problem last year? When exceptions and requirement changes
start to affect procedural code, manageable complexity turns into
unmaintainable spaghetti code.

Step back from the previous example and consider what that code was
doing. It was looking at every object in a List,
applying a criteria, and, if that criteria was satisfied, acting upon
an object. A critical improvement that could be made to the previous
example is the decoupling of the criteria from the code that acts upon
an object. The following two code excerpts solve the previous problem
in a very different way. First, the criteria for the honor roll and
problem students are modeled by two Predicate objects,
and the code that acts upon honor roll and problem students is
modeled by two Closure objects. These four objects are
defined below:

import org.apache.commons.collections.Closure;

import org.apache.commons.collections.Predicate;



// Anonymous Predicate that decides if a student 

// has made the honor roll.

Predicate isHonorRoll = new Predicate() {

  public boolean evaluate(Object object) {

    Student s = (Student) object;



    return( ( s.getGrade().equals( "A" ) ) ||

            ( s.getGrade().equals( "B" ) && 

              s.getAttendance() == PERFECT ) );

  }

};



// Anonymous Predicate that decides if a student

// has a problem.

Predicate isProblem = new Predicate() {

  public boolean evaluate(Object object) {

    Student s = (Student) object;



    return ( ( s.getGrade().equals( "D" ) || 

               s.getGrade().equals( "F" ) ) ||

             s.getStatus() == SUSPENDED );

  }

};



// Anonymous Closure that adds a student to the 

// honor roll

Closure addToHonorRoll = new Closure() {

  public void execute(Object object) {

    Student s = (Student) object;

      

    // Add an award to student record

    s.addAward( "honor roll", 2005 );

    Database.saveStudent( s );

  }

};



// Anonymous Closure flags a student for attention

Closure flagForAttention = new Closure() {

  public void execute(Object object) {

    Student s = (Student) object;

      

    // Flag student for special attention

    s.addNote( "talk to student", 2005 );

    s.addNote( "meeting with parents", 2005 );

    Database.saveStudent( s );

  }

};

The four anonymous implementations of Predicate and
Closure are separated from the system as a whole.
flagForAttention has no knowledge of what the criteria
are for a problem student, and the isProblem Predicate
only knows how to identify a problem student. What is needed is a way
to marry the right Predicate with the right
Closure, and this is shown in the following example.

import org.apache.commons.collections.ClosureUtils;

import org.apache.commons.collections.CollectionUtils;

import org.apache.commons.collections.functors.NOPClosure;



Map predicateMap = new HashMap();



predicateMap.put( isHonorRoll, addToHonorRoll );

predicateMap.put( isProblem, flagForAttention );

predicateMap.put( null, ClosureUtils.nopClosure() );



Closure processStudents = 

    ClosureUtils.switchClosure( predicateMap );



CollectionUtils.forAllDo( allStudents, processStudents );

In the previous code, the predicateMap matches
Predicates to Closures; if a
Student satisfies the Predicate in the key,
it will be passed to the Closure in the value. By
supplying a NOPClosure value and a null key,
we will pass Student objects that satisfy neither
Predicate to a "do nothing" or "no operation"
NOPClosure created by a call to
ClosureUtils. A SwitchClosure,
processStudents, is created from the
predicateMap, and the processStudents
Closure is applied to every Student object
in the allStudents using
CollectionUtils.forAllDo(). This is a very different
approach; notice that you are not iterating through any lists.
Instead, you set rules and consequences and
CollectionUtils and SwitchClosure take care
of the execution.

When you separate criteria using Predicates and actions
using Closures, your code is less procedural and much
easier to test. The isHonorRoll Predicate can be unit
tested in isolation from the addToHonorRoll Closure, and
both can be tested by supplying a mock instance of the
Student class. The second example also demonstrates
CollectionUtils.forAllDo(), which applies a
Closure to every element in a Collection.
You may have noticed that using functors did not reduce the line count; in
fact, the use of functors increased the line count. But the real benefit
from functors is the modularity and encapsulation of criteria and
actions. If your method length tends towards hundreds of lines,
consider an less procedural, more object-oriented approach--use a
functor.

Chapter 4, "Functors," in the Jakarta Commons
Cookbook introduces functors available in Commons Collections, and
Chapter 5, "Collections," shows you how to use functors with the Java Collections
API. All of the functors--Closure,
Predicate, and Transformer--can be combined
into composite functors that can be used to model any kind of logic.
switch, while, and for
structures can be modeled with SwitchClosure,
WhileClosure, and ForClosure. Compound
logical expressions can be constructed from multiple
Predicates using OrPredicate,
AndPredicate, AllPredicate, and
NonePredicate, among others. Commons BeanUtils also
contains functor implementations that are used to apply functors to
bean properties--BeanPredicate,
BeanComparator, and
BeanPropertyValueChangeClosure. Functors are a different
way of thinking about low-level application architecture, and they
could very well change your approach to coding.

3. Using XPath Syntax to Query Objects and Collections

Commons JXPath
is a surprising (non-standard) use of an XML standard. XPath has been
around for some time as a way to select a node or node set in an XSL
style sheet. If you've worked with XML, you are probably familiar with
the syntax /foo/bar that selects the bar
sub-elements of the foo document element. Jakarta Commons
JXPath adds an interesting twist: you can use JXPath to select objects
from beans and collections, among other object types such as servlet
contexts and DOM Document objects. Consider a
List of Person objects. Each
Person object has a bean property of the type
Job, and each Job object has a
salary property of the type int.
Person objects also have a country property,
which is a two-letter country code. Using JXPath, it is easy to
select all Person objects with a US country
and a Job that pays more than one million
dollars. Here is some code to set up a List of beans to
filter with JXPath:

// Person's constructor sets firstName and country

Person person1 = new Person( "Tim", "US" );

Person person2 = new Person( "John", "US" );

Person person3 = new Person( "Al",  "US" );

Person person4 = new Person( "Tony", "GB" );



// Job's constructor sets name and salary

person1.setJob( new Job( "Developer", 40000 ) );

person2.setJob( new Job( "Senator", 150000 ) );

person3.setJob( new Job( "Comedian", 3400302 ) );

person4.setJob( new Job( "Minister", 2000000 ) );



Person[] personArr = 

  new Person[] { person1, person2, 

                 person3, person4 };



List people = Arrays.asList( personArr );

The people List contains four
Person beans: Tim, John, Al, and George. Tim is a
developer who makes $40,000, John is a Senator who makes $150,000, Al
is a comedian who walks home with $3.4 million, and Tony is a prime
minister who makes 2 million euros. Our task is simple: iterate over
this List and print the name of every Person
who is a U.S. citizen making over one million dollars. Assume that
people is an ArrayList of
Person objects, and take a look at the solution without
the benefit of JXPath:

Iterator peopleIter = people.getIterator();

while( peopleIter.hasNext() ) {

  Person person = (Person) peopleIter.next();



  if( person.getCountry() != null &&

      person.getCountry().equals( "US" ) &&

      person.getJob() != null &&

      person.getJob().getSalary() > 1000000 ) {

        print( person.getFirstName() + " "

               person.getLastName() );

      }

    }

  }

}

The previous example is heavy, and somewhat error-prone. To find the
matching Person objects, you first need to iterate over
each Person and test the country property of
each. If the country property is not null
and it has the correct value, then you must test the job
property to find out if it is non-null and has
salary property greater than 1000000. The line count of
the previous example can be dramatically reduced with Java 1.5's
for syntax, but, even with Java 1.5, you still need to
perform two comparisons at two different levels.

What if you had to write a number of these queries against a set of
Person objects stored in memory? What if your
application had to display all of the Person objects in
England named Tony? Or, what if you had to print the name
of every Job with a salary less than 20,000? If you were
storing these objects in a relational database, you could solve this
by writing a SQL query, but if you are dealing with objects in memory,
you don't have this luxury. While XPath was primarily meant for XML,
you could use it to write "queries" against a collection of objects,
treating objects as elements and bean properties as sub-elements.
Yes, this is a strange application of XPath, but take a look at how
the following example performs three different queries against
people, an ArrayList of Person
objects.

import org.apache.commons.jxpath.JXPathContext;



public List queryCollection(String xpath,

                            Collection col) {

    List results = new ArrayList();



    JXPathContext context = 

        JXPathContext.newContext( col );

 

    Iterator matching = 

        context.iterate( xpath );



    while( matching.hasNext() ) {

        results.add( matching.getNext() );

    }

    return results;

}



String query1 =

   ".[@country = 'US']/job[@salary > 1000000]/..";  

String query2 =

   ".[@country = 'GB' and @name = 'Tony']";  

String query3 = 

   "./job/name";



List richUsPeople = 

    queryCollection( query1, people );

List britishTony = 

    queryCollection( query2, people );

List jobNames = 

    queryCollection( query3, people );

The method queryCollection() takes an XPath expression
and applies it to a Collection. XPath expressions are
evaluated against a JXPathContext, which is created by
calling JXPathContext.newContext() and passing in the
Collection to be queried. Calling
context.iterate() then applies the XPath expression to
each item in the Collection, returning an
Iterator with every matching "node" (or in this
case, "object"). The first query performed by the previous
example, query1, is same query from the original example
implemented without JXPath. query2 selects all
Person objects with a country property of
GB and a name property of Tony,
and query3 selects a List of String
objects, the name property of all of the Job
objects.

When I first saw Commons JXPath, it struck me as a bad idea. Why apply
XPath expressions to objects? Something about it didn't feel right.
But this unexpected use of XPath as a query language for a collection
of beans has come in handy for me more than a few times in the past few
years. If you find yourself looping through lists to find matching
elements, consider using JXPath. For more information, see Chapter 12,
"Searching and Filter," of Jakarta
Commons Cookbook, which discusses Commons JXPath and Jakarta Lucene
paired with Commons Digester.

And There's More

Stay tuned to this exploration of the far reaches of the Jakarta
Commons. In the next part of this series, I'll introduce some related
tools and utilities. Set operations in Commons Collections, using
Predicate objects with collections, configuring an application with href="http://jakarta.apache.org/commons/configuration">Commons
Configuration, and using href="http://jakarta.apache.org/commons/betwixt">Commons Betwixt
to read and write XML. There is much to be gained from the Jakarta
Commons that cannot be conveyed in a few thousand words, and I would
encourage you to take a look at the href="http://www.oreilly.com/catalog/jakartackbk">Jakarta Commons Cookbook. Many of
these utilities may, at first glance, seem somewhat trivial, but the
power of Jakarta Commons lies in how these tools can be combined with
each other and integrated into your own systems.