The Hidden Gems of Jakarta Commons, Part 1
by Timothy M. O'Brien
12/22/2004
If you are not familiar with the
href="http://jakarta.apache.org/commons">Jakarta Commons, you have
likely reinvented a few wheels. Before you write any more generic
frameworks or utilities, grok the Commons. It will save you serious
time. Too many people write a StringUtils
class that
duplicates methods available in
href="http://jakarta.apache.org/commons/lang">Commons Lang's
StringUtils
, or developers unknowingly recreate the
utilities in
href="http://jakarta.apache.org/commons/collections">Commons
Collections even though commons-collections.jar is
already available in the classpath. Seriously, take a break. Check
out the Commons Collections API and then go back to your task; I
promise you'll find something simple that will save you a week over
the next year. If people just took some time to look at Jakarta
Commons, we would have much less code duplication--we'd start making
good on the real promise of reuse. I've seen it happen; somebody digs
into Commons BeanUtils
or Commons Collections and invariably they have
a "Oh, if I had only known about this, I wouldn't have written 10,000
lines of code" moment. There are still parts of Jakarta Commons that
remain a mystery to most; for instance, many have yet to hear of
href="http://jakarta.apache.org/commons/cli">Commons CLI or
href="http://jakarta.apache.org/commons/configuration">Commons
Configuration, and most have yet to notice the valuable
functors
package in Commons Collections. In this series,
I emphasize some of the less-appreciated tools and utilities in the
Jakarta Commons.
In this first part of the series, I explore XML rule set definitions in
the Commons
Digester, functors available in Commons Collections, and an
interesting application,
href="http://jakarta.apache.org/commons/jxpath">Commons JXPath, to
query a List
of objects. Jakarta Commons
contains utilities that aim to help you solve problems at the lowest
level of programming: iterating over collections, parsing XML, and
selecting objects from a List
. I would encourage you to
spend some time focusing on these small utilities, as learning about
the Jakarta Commons will save you a substantial amount of time. It
isn't simply about using Commons Digester to parse XML or using
CollectionUtils
to filter a collection with a
Predicate
. You will start to see benefits once you
realize how to combine the power of these utilities and how to relate
Commons projects to your own applications; once this happens, you will
come to see commons-lang.jar,
commons-beanutils.jar, and
commons-digester.jar as just as indispensable to any system as
the JVM itself.
Related Reading Jakarta Commons Cookbook |
If you are interested in learning more about the Jakarta Commons,
check out the Jakarta Commons
Cookbook. This book is full of recipes that will get you hooked
on the Commons, and tells you how to use Jakarta Commons in concert
with other small open source components such as
href="http://jakarta.apache.org/velocity">Velocity,
href="http://www.freemarker.org">FreeMarker,
href="http://jakarta.apache.org/lucene">Lucene, and
href="http://jakarta.apache.org/slide">Jakarta Slide. In this
book, I introduce a wide array of tools from Jakarta Commons from
using simple utilities in Commons Lang to combining Commons Digester,
Commons Collections, and Jakarta Lucene to search the works of William
Shakespeare. I hope this series and the
href="http://www.oreilly.com//catalog/jakartackbk">Jakarta Commons Cookbook provide you
with some interesting solutions for low-level programming problems.
1. XML-Based Rule Sets for Commons Digester
Commons Digester
1.6 provides one of the easiest ways to turn XML into objects.
Digester has already been introduced on the O'Reilly network in two
articles: "Learning
and Using Jakarta Digester," by Philipp K. Janert, and "
href="http://www.oreillynet.com/pub/a/onjava/2003/07/09/commons.html"/>Using the Jakarta
Commons, Part 2," by Vikram Goyal. Both articles demonstrate the use of
XML rule sets, but this idea of defining rule sets in XML has not
caught on. Most sightings of the Digester appear to define rule sets
programmatically, in compiled code. You should avoid hard-coding
Digester rule sets in compiled Java code when you have the opportunity
to store such mapping information in an external file or a classpath
resource. Externalizing a Digester rule set makes it easier to adapt to an
evolving XML document structure or an evolving object model.
To demonstrate the difference between defining rule sets in XML and
defining rule sets in compiled code, consider a system to parse XML to
a Person
bean with three properties--id
,
name
, and age
, as defined in the following class:
package org.test;
public class Person {
public String id;
public String name;
public int age;
public Person() {}
public String getId() { return id; }
public void setId(String id) {
this.id = id;
}
public String getName() { return name; }
public void setName(String name) {
this.name = name;
}
public int getAge() { return age; }
public void setAge(int age) {
this.age = age;
}
}
Assume that your application needs to parse an XML file containing
multiple person
elements. The following XML file,
data.xml, contains two person
elements
that you would like to parse into Person
objects:
<people>
<person id="1">
<name>Tom Higgins</name>
<age>25</age>
</person>
<person id="2">
<name>Barney Smith</name>
<age>75</age>
</person>
<person id="3">
<name>Susan Shields</name>
<age>53</age>
</person>
</people>
You expect the structure and content of this XML file to change over
the next few months, and you would prefer not to hard-code the
structure of the XML document in compiled Java code. To do this, you
need to define Digester rules in an XML file that is loaded as a
resource from the classpath. The following XML document,
person-rules.xml, maps the person
element to
the Person
bean:
<digester-rules>
<pattern value="people/person">
<object-create-rule classname="org.test.Person"/>
<set-next-rule methodname="add"
paramtype="java.lang.Object"/>
<set-properties-rule/>
<bean-property-setter-rule pattern="name"/>
<bean-property-setter-rule pattern="age"/>
</pattern>
</digester-rules>
All this does is instruct the Digester to create a new instance of
Person
every time it encounters a person
element, call add()
to add this Person
to an
ArrayList
, set any bean properties that match attributes
on the person
element, and set the name
and
age
properties from the sub-elements name
and age
. You've seen the Person
class, the
XML document to be parsed, and the Digester rule definitions in XML
form. Now you need to create an instance of Digester
with
the rules defined in person-rules.xml. The following
code creates a Digester
by passing the URL
of the person-rules.xml resource to the
DigesterLoader
. Since the person-rules.xml
file is a classpath resource in the same package as the class parsing
the XML, the URL is obtained with a call to
getClass().getResource()
. The
DigesterLoader
then parses the rule definitions and adds
these rules to the newly created Digester
:
import org.apache.commons.digester.Digester;
import org.apache.commons.digester.xmlrules.DigesterLoader;
// Configure Digester from XML ruleset
URL rules = getClass().getResource("./person-rules.xml");
Digester digester =
DigesterLoader.createDigester(rules);
// Push empty List onto Digester's Stack
List people = new ArrayList();
digester.push( people );
// Parse the XML document
InputStream input = new FileInputStream( "data.xml" );
digester.parse( input );
Once the Digester
has parsed the XML in
data.xml, three Person
objects should be in
the people
ArrayList
.
The alternative to defining Digester rules in XML is to add them using
the convenience methods on a Digester
instance. Most
articles and examples start with this method, adding rules using the
addObjectCreate()
and
addBeanPropertySetter()
methods on Digester
.
The following code adds the same rules that were defined in
person-rules.xml:
digester.addObjectCreate("people/person",
Person.class);
digester.addSetNext("people/person",
"add",
"java.lang.Object");
digester.addBeanPropertySetter("people/person",
"name");
digester.addBeanPropertySetter("people/person",
"age");
If you have ever found yourself working at an organization with
2500-line classes to parse a huge XML document with SAX, or a whole
collection of classes to work with DOM or JDOM, you understand that
XML parsing is more complex than it needs to be, in the majority of
cases. If you are building a highly efficient system with strict
speed and memory requirements, you need the speed of a SAX parser. If
you need the complexity of the DOM Level 3, use a parser like
href="http://xml.apache.org/#xerces">Apache Xerces. But if you
are simply trying to parse a few XML documents into objects, take a
look at Commons Digester, and define your rule set in an XML file.
Any time you can move this type of configuration outside of compiled
code, you should. I would encourage you to define your digester rules
in an XML file loaded either from the file system or the classpath.
Doing so will make it easier to adapt your program to changes in the
XML document and changes in your object model. For more information
on defining Digester rules in an XML file, see Section 6.2 of the
href="http://www.oreilly.com/catalog/jakartackbk">Jakarta Commons Cookbook, "Turning
XML Documents into Objects."
|
2. Functors in Commons Collections
Functors are an interesting part of Commons Collections 3.1 for two
reasons: they haven't received the attention they warrant, and they
have the potential to change the way you approach programming.
Functor is just a fancy name for an object that
encapsulates a function--a "functional object." And
while they are certainly not the same thing, if you have ever used
method pointers in C or C++, you'll understand the power of functors.
A functor is an object--a Predicate
, a
Closure
, or a
Transformer
. Predicate
s evaluate objects and
return a boolean
, Transformer
s evaluate objects and
return new objects, and Closure
s accept objects and
execute code. Functors can be combined into composite functors that
model loops, logical expressions, and control structures, and functors
can also be used to filter and operate upon items in a collection.
Explaining functors in an article as short as this may be impossible,
so to "jump start" your introduction to functors, I will solve the same problem both with and without functors.
In this example, Student
objects from an
ArrayList
are sorted into two List
instances
if they meet certain criteria; students with straight-A grades are
added to an honorRollStudents
list, and students with Ds
and Fs are added to a problemStudents
list. After the
students are separated, the system will iterate through each list,
giving the honor-roll students an award and scheduling a meeting with
parents of problem students. The following code implements this
process without the use of functors:
List allStudents = getAllStudents();
// Create 2 ArrayLists to hold honorRoll students
// and problem students
List honorRollStudents = new ArrayList();
List problemStudents = new ArrayList();
// Iterate through all students. Put the
// honorRoll students in one List and the
// problem students in another.
Iterator allStudentsIter = allStudents.iterator();
while( allStudentsIter.hasNext() ) {
Student s = (Student) allStudentsIter.next();
if( s.getGrade().equals( "A" ) ) {
honorRollStudents.add( s );
} else if( s.getGrade().equals( "B" ) &&
s.getAttendance() == PERFECT) {
honorRollStudents.add( s );
} else if( s.getGrade().equals( "D" ) ||
s.getGrade().equals( "F" ) ) {
problemStudents.add( s );
} else if( s.getStatus() == SUSPENDED ) {
problemStudents.add( s );
}
}
// For all honorRoll students, add an award and
// save to the Database.
Iterator honorRollIter =
honorRollStudents.iterator();
while( honorRollIter.hasNext() ) {
Student s = (Student) honorRollIter.next();
// Add an award to student record
s.addAward( "honor roll", 2005 );
Database.saveStudent( s );
}
// For all problem students, add a note and
// save to the database.
Iterator problemIter = problemStudents.iterator();
while( problemIter.hasNext() ) {
Student s = (Student) problemIter.next();
// Flag student for special attention
s.addNote( "talk to student", 2005 );
s.addNote( "meeting with parents", 2005 );
Database.saveStudent( s );
}
The previous example is very procedural; the only way to figure out
what happens to a Student
object is to step through each
line of code. The first half of this example is decision logic that
applies tests to each Student
object and classifies
students based on performance and attendance. The second half of this
example operates on the Student
objects and saves the result to the
database. A 50-line method body like the previous example is how most
systems begin--manageable procedural complexity. But problems start
to appear when the requirements start to shift. As soon as that
decision logic changes, you will need to start adding more clauses to
the logical expressions in the first half of the previous example.
For example, what happens to your logical expression if a student is
classified as a problem if he has a B and perfect attendance, but
attended detention more than five times? Or what happens to the
second half, when a student can be on the honor roll only if they were
not a problem last year? When exceptions and requirement changes
start to affect procedural code, manageable complexity turns into
unmaintainable spaghetti code.
Step back from the previous example and consider what that code was
doing. It was looking at every object in a List
,
applying a criteria, and, if that criteria was satisfied, acting upon
an object. A critical improvement that could be made to the previous
example is the decoupling of the criteria from the code that acts upon
an object. The following two code excerpts solve the previous problem
in a very different way. First, the criteria for the honor roll and
problem students are modeled by two Predicate
objects,
and the code that acts upon honor roll and problem students is
modeled by two Closure
objects. These four objects are
defined below:
import org.apache.commons.collections.Closure;
import org.apache.commons.collections.Predicate;
// Anonymous Predicate that decides if a student
// has made the honor roll.
Predicate isHonorRoll = new Predicate() {
public boolean evaluate(Object object) {
Student s = (Student) object;
return( ( s.getGrade().equals( "A" ) ) ||
( s.getGrade().equals( "B" ) &&
s.getAttendance() == PERFECT ) );
}
};
// Anonymous Predicate that decides if a student
// has a problem.
Predicate isProblem = new Predicate() {
public boolean evaluate(Object object) {
Student s = (Student) object;
return ( ( s.getGrade().equals( "D" ) ||
s.getGrade().equals( "F" ) ) ||
s.getStatus() == SUSPENDED );
}
};
// Anonymous Closure that adds a student to the
// honor roll
Closure addToHonorRoll = new Closure() {
public void execute(Object object) {
Student s = (Student) object;
// Add an award to student record
s.addAward( "honor roll", 2005 );
Database.saveStudent( s );
}
};
// Anonymous Closure flags a student for attention
Closure flagForAttention = new Closure() {
public void execute(Object object) {
Student s = (Student) object;
// Flag student for special attention
s.addNote( "talk to student", 2005 );
s.addNote( "meeting with parents", 2005 );
Database.saveStudent( s );
}
};
The four anonymous implementations of Predicate
and
Closure
are separated from the system as a whole.
flagForAttention
has no knowledge of what the criteria
are for a problem student, and the isProblem Predicate
only knows how to identify a problem student. What is needed is a way
to marry the right Predicate
with the right
Closure
, and this is shown in the following example.
import org.apache.commons.collections.ClosureUtils;
import org.apache.commons.collections.CollectionUtils;
import org.apache.commons.collections.functors.NOPClosure;
Map predicateMap = new HashMap();
predicateMap.put( isHonorRoll, addToHonorRoll );
predicateMap.put( isProblem, flagForAttention );
predicateMap.put( null, ClosureUtils.nopClosure() );
Closure processStudents =
ClosureUtils.switchClosure( predicateMap );
CollectionUtils.forAllDo( allStudents, processStudents );
In the previous code, the predicateMap
matches
Predicate
s to Closure
s; if a
Student
satisfies the Predicate
in the key,
it will be passed to the Closure
in the value. By
supplying a NOPClosure
value and a null
key,
we will pass Student
objects that satisfy neither
Predicate
to a "do nothing" or "no operation"
NOPClosure
created by a call to
ClosureUtils
. A SwitchClosure
,
processStudents
, is created from the
predicateMap
, and the processStudents
Closure
is applied to every Student
object
in the allStudents
using
CollectionUtils.forAllDo()
. This is a very different
approach; notice that you are not iterating through any lists.
Instead, you set rules and consequences and
CollectionUtils
and SwitchClosure
take care
of the execution.
When you separate criteria using Predicate
s and actions
using Closure
s, your code is less procedural and much
easier to test. The isHonorRoll Predicate
can be unit
tested in isolation from the addToHonorRoll Closure
, and
both can be tested by supplying a mock instance of the
Student
class. The second example also demonstrates
CollectionUtils.forAllDo()
, which applies a
Closure
to every element in a Collection
.
You may have noticed that using functors did not reduce the line count; in
fact, the use of functors increased the line count. But the real benefit
from functors is the modularity and encapsulation of criteria and
actions. If your method length tends towards hundreds of lines,
consider an less procedural, more object-oriented approach--use a
functor.
Chapter 4, "Functors," in the Jakarta Commons
Cookbook introduces functors available in Commons Collections, and
Chapter 5, "Collections," shows you how to use functors with the Java Collections
API. All of the functors--Closure
,
Predicate
, and Transformer
--can be combined
into composite functors that can be used to model any kind of logic.
switch
, while
, and for
structures can be modeled with SwitchClosure
,
WhileClosure
, and ForClosure
. Compound
logical expressions can be constructed from multiple
Predicate
s using OrPredicate
,
AndPredicate
, AllPredicate
, and
NonePredicate
, among others. Commons BeanUtils
also
contains functor implementations that are used to apply functors to
bean properties--BeanPredicate
,
BeanComparator
, and
BeanPropertyValueChangeClosure
. Functors are a different
way of thinking about low-level application architecture, and they
could very well change your approach to coding.
3. Using XPath Syntax to Query Objects and Collections
Commons JXPath
is a surprising (non-standard) use of an XML standard. XPath has been
around for some time as a way to select a node or node set in an XSL
style sheet. If you've worked with XML, you are probably familiar with
the syntax /foo/bar
that selects the bar
sub-elements of the foo
document element. Jakarta Commons
JXPath adds an interesting twist: you can use JXPath to select objects
from beans and collections, among other object types such as servlet
contexts and DOM Document
objects. Consider a
List
of Person
objects. Each
Person
object has a bean property of the type
Job
, and each Job
object has a
salary
property of the type int
.
Person
objects also have a country
property,
which is a two-letter country code. Using JXPath, it is easy to
select all Person
objects with a US
country
and a Job
that pays more than one million
dollars. Here is some code to set up a List
of beans to
filter with JXPath:
// Person's constructor sets firstName and country
Person person1 = new Person( "Tim", "US" );
Person person2 = new Person( "John", "US" );
Person person3 = new Person( "Al", "US" );
Person person4 = new Person( "Tony", "GB" );
// Job's constructor sets name and salary
person1.setJob( new Job( "Developer", 40000 ) );
person2.setJob( new Job( "Senator", 150000 ) );
person3.setJob( new Job( "Comedian", 3400302 ) );
person4.setJob( new Job( "Minister", 2000000 ) );
Person[] personArr =
new Person[] { person1, person2,
person3, person4 };
List people = Arrays.asList( personArr );
The people
List
contains four
Person
beans: Tim, John, Al, and George. Tim is a
developer who makes $40,000, John is a Senator who makes $150,000, Al
is a comedian who walks home with $3.4 million, and Tony is a prime
minister who makes 2 million euros. Our task is simple: iterate over
this List
and print the name of every Person
who is a U.S. citizen making over one million dollars. Assume that
people
is an ArrayList
of
Person
objects, and take a look at the solution without
the benefit of JXPath:
Iterator peopleIter = people.getIterator();
while( peopleIter.hasNext() ) {
Person person = (Person) peopleIter.next();
if( person.getCountry() != null &&
person.getCountry().equals( "US" ) &&
person.getJob() != null &&
person.getJob().getSalary() > 1000000 ) {
print( person.getFirstName() + " "
person.getLastName() );
}
}
}
}
The previous example is heavy, and somewhat error-prone. To find the
matching Person
objects, you first need to iterate over
each Person
and test the country
property of
each. If the country
property is not null
and it has the correct value, then you must test the job
property to find out if it is non-null
and has
salary
property greater than 1000000. The line count of
the previous example can be dramatically reduced with Java 1.5's
for
syntax, but, even with Java 1.5, you still need to
perform two comparisons at two different levels.
What if you had to write a number of these queries against a set of
Person
objects stored in memory? What if your
application had to display all of the Person
objects in
England named Tony
? Or, what if you had to print the name
of every Job
with a salary less than 20,000? If you were
storing these objects in a relational database, you could solve this
by writing a SQL query, but if you are dealing with objects in memory,
you don't have this luxury. While XPath was primarily meant for XML,
you could use it to write "queries" against a collection of objects,
treating objects as elements and bean properties as sub-elements.
Yes, this is a strange application of XPath, but take a look at how
the following example performs three different queries against
people
, an ArrayList
of Person
objects.
import org.apache.commons.jxpath.JXPathContext;
public List queryCollection(String xpath,
Collection col) {
List results = new ArrayList();
JXPathContext context =
JXPathContext.newContext( col );
Iterator matching =
context.iterate( xpath );
while( matching.hasNext() ) {
results.add( matching.getNext() );
}
return results;
}
String query1 =
".[@country = 'US']/job[@salary > 1000000]/..";
String query2 =
".[@country = 'GB' and @name = 'Tony']";
String query3 =
"./job/name";
List richUsPeople =
queryCollection( query1, people );
List britishTony =
queryCollection( query2, people );
List jobNames =
queryCollection( query3, people );
The method queryCollection()
takes an XPath expression
and applies it to a Collection
. XPath expressions are
evaluated against a JXPathContext
, which is created by
calling JXPathContext.newContext()
and passing in the
Collection
to be queried. Calling
context.iterate()
then applies the XPath expression to
each item in the Collection
, returning an
Iterator
with every matching "node" (or in this
case, "object"). The first query performed by the previous
example, query1
, is same query from the original example
implemented without JXPath. query2
selects all
Person
objects with a country
property of
GB
and a name
property of Tony
,
and query3
selects a List
of String
objects, the name
property of all of the Job
objects.
When I first saw Commons JXPath, it struck me as a bad idea. Why apply
XPath expressions to objects? Something about it didn't feel right.
But this unexpected use of XPath as a query language for a collection
of beans has come in handy for me more than a few times in the past few
years. If you find yourself looping through lists to find matching
elements, consider using JXPath. For more information, see Chapter 12,
"Searching and Filter," of Jakarta
Commons Cookbook, which discusses Commons JXPath and Jakarta Lucene
paired with Commons Digester.
And There's More
Stay tuned to this exploration of the far reaches of the Jakarta
Commons. In the next part of this series, I'll introduce some related
tools and utilities. Set operations in Commons Collections, using
Predicate
objects with collections, configuring an application with
href="http://jakarta.apache.org/commons/configuration">Commons
Configuration, and using
href="http://jakarta.apache.org/commons/betwixt">Commons Betwixt
to read and write XML. There is much to be gained from the Jakarta
Commons that cannot be conveyed in a few thousand words, and I would
encourage you to take a look at the
href="http://www.oreilly.com/catalog/jakartackbk">Jakarta Commons Cookbook. Many of
these utilities may, at first glance, seem somewhat trivial, but the
power of Jakarta Commons lies in how these tools can be combined with
each other and integrated into your own systems.
Timothy M. O'Brien
is a professional singer/programmer living and working in the Chicago area.