This blog is now closed!

This blog has been replaced by my personal blog over at my website. Please update your bookmarks and feeds accordingly.

October 28, 2007

Solving Hibernate Criteria's "Distinct Root Entity" limitation once and for all - including pagination

I have to start with a rant: The Hibernate Criteria API, which I learned to at least partially like over the last months working on Maptales server code, has a few limitations which repeatedly make me wonder whether Hibernate really has been the right choice for a DB abstraction layer from time to time. It's not so much the limitations themselves, but rather the total lack of documentation of these limitations in the official Hibernate docs and examples and the generally rather arrogant tone on the Hibernate mailing Lists and FAQs — if you get an answer at all, that is — that annoy me and I guess a lot of other Hibernate users. The examples provided — and therefore copied blindly into nearly every Tutorial or Book about Hibernate out there — always avoid to show these limitations because of the triviality of the scenarios they cover. The purpose of a proper documentation would — for me — be to document especially such limitations and border cases where an API fails and provide solutions for these cases.

The problem is even amplified by the fact that googling for such limitiations rarely brings up a solution, since, as mentioned above, the hibernate documentation and inherited tutorials all do a good job in concealing the problems, and few coders out there seem to really use Hibernate in more sophisticated ways and publish their experiences. Which makes me wonder, because for these trivial cases too often good ol' SQL would have been perfectly sufficient — and advantageous to those lamer wannabee DB coders who would actually be forced to learn something about databases instead of replicating stupid Hibernate tutorial examples.

</rant>

OK now the specific limitation I want to talk about is: Hibernate does not return distinct results for a query with outer join fetching enabled for a collection. At least that's what they call it in the Hibernate advanced problems FAQ, failing to discuss this problem in deep and to give a general solution. Quote:

One day Hibernate might be smart enough to know that if you call setFirstResult() or setMaxResults() it should not use a join, but a second SQL SELECT. Try it, your version of Hibernate might already be smart enough. If not, write two queries, one for limiting stuff, the other for eager fetching.
And here they are again cutting of a large part of the problem. First, I checked an my version of hibernate is not "smart" enough, and it does not look like it will get any smarter soon. Second, the approach proposed there did not work for me when querying on properties of objects inside a collection - the collection has to always be JOINed for the query to work, therefore always producing results useless for pagination after transforming to distinct root entities. Generally, my problem was harder in a number of ways than the ususal examples:
  • I am nearly always using criterias on collection members, therefore I cannot revert to SELECT fetching.
  • I always have to use pagination (setFirstResult() and setMaxResults()), because in a web application it is not nice to return thousands of results on one page.
  • I am using PostGreSQL which adds a few oddities above it all (see below)
After digging through a lot of useless to semi-enlightening blog posts (thanks to Rick Hightower for providing two thirds of the solution) I could finally come up with the ultimate solution that works for me now:

First you have to accept that you have to live with 2 queries instead of one. You simply cannot do pagination and restrictions on collection members in hibernate in a single criteria query. Once you have accepted that fact, you can go ahead and construct your base query.

public List queryCats(Date queryDate) throws Exception {
Criteria criteria = getSession().createCriteria(Cat.class);
criteria.setFirstResult(offset).setMaxResults(num);
criteria.addOrder(Order.asc("age"));
if (queryDate != null) {
Criteria subCrit = criteria.createCriteria("kittens", "kitten");
subCrit.add(Expression.ge("birthDate", queryDate));
}

What you need to do then is to project the result to a collection of ids of the objects that match the query, eliminating duplicates. Here also the mentioned speciality of Postgres (at a second look it seems to be a general limitation) comes in: you have to include fields that are used for ordering in the projection clause. The nice thing is we can do the projection on demand and only if collection memebers are actually queried — this comes in handy for programatically constructed queries which can use the simple, single query way if collection properties are not used in the query:

  if (queryDate != null) {
// project result to distinct ids,
// including columns used for sorting
criteria.setProjection(Projections.distinct(
Projections.projectionList()
.add(Projections.id())
.add(Projections.property("birthDate"))
));
List list = criteria.list();

// unfortunately we have to copy the ids out of the
// resulting Object[] List
List idlist = new ArrayList<long>();

for (Iterator iditer = list.iterator(); iditer.hasNext();) {
Object[] record = (Object[]) iditer.next();
idlist.add((Long)record[0]);
}

// another Hibernate stupidity: empty Lists cause
// Expression.in to throw an error
if (idlist.size() > 0) {
criteria = getSession().createCriteria(Cat.class);
criteria.add(Expression.in("id", idlist));
}
else {
return new ArrayList();
}
}

You can then transparently use the id-query in the complex case or issue the main query in the simple case:

  return criteria.list();
}

The presented approach has worked for me in a number of cases, and I am very happy now!