This blog is now closed!

This blog has been replaced by my personal blog over at my website. Please update your bookmarks and feeds accordingly.

October 28, 2007

Solving Hibernate Criteria's "Distinct Root Entity" limitation once and for all - including pagination

I have to start with a rant: The Hibernate Criteria API, which I learned to at least partially like over the last months working on Maptales server code, has a few limitations which repeatedly make me wonder whether Hibernate really has been the right choice for a DB abstraction layer from time to time. It's not so much the limitations themselves, but rather the total lack of documentation of these limitations in the official Hibernate docs and examples and the generally rather arrogant tone on the Hibernate mailing Lists and FAQs — if you get an answer at all, that is — that annoy me and I guess a lot of other Hibernate users. The examples provided — and therefore copied blindly into nearly every Tutorial or Book about Hibernate out there — always avoid to show these limitations because of the triviality of the scenarios they cover. The purpose of a proper documentation would — for me — be to document especially such limitations and border cases where an API fails and provide solutions for these cases.

The problem is even amplified by the fact that googling for such limitiations rarely brings up a solution, since, as mentioned above, the hibernate documentation and inherited tutorials all do a good job in concealing the problems, and few coders out there seem to really use Hibernate in more sophisticated ways and publish their experiences. Which makes me wonder, because for these trivial cases too often good ol' SQL would have been perfectly sufficient — and advantageous to those lamer wannabee DB coders who would actually be forced to learn something about databases instead of replicating stupid Hibernate tutorial examples.

</rant>

OK now the specific limitation I want to talk about is: Hibernate does not return distinct results for a query with outer join fetching enabled for a collection. At least that's what they call it in the Hibernate advanced problems FAQ, failing to discuss this problem in deep and to give a general solution. Quote:

One day Hibernate might be smart enough to know that if you call setFirstResult() or setMaxResults() it should not use a join, but a second SQL SELECT. Try it, your version of Hibernate might already be smart enough. If not, write two queries, one for limiting stuff, the other for eager fetching.
And here they are again cutting of a large part of the problem. First, I checked an my version of hibernate is not "smart" enough, and it does not look like it will get any smarter soon. Second, the approach proposed there did not work for me when querying on properties of objects inside a collection - the collection has to always be JOINed for the query to work, therefore always producing results useless for pagination after transforming to distinct root entities. Generally, my problem was harder in a number of ways than the ususal examples:
  • I am nearly always using criterias on collection members, therefore I cannot revert to SELECT fetching.
  • I always have to use pagination (setFirstResult() and setMaxResults()), because in a web application it is not nice to return thousands of results on one page.
  • I am using PostGreSQL which adds a few oddities above it all (see below)
After digging through a lot of useless to semi-enlightening blog posts (thanks to Rick Hightower for providing two thirds of the solution) I could finally come up with the ultimate solution that works for me now:

First you have to accept that you have to live with 2 queries instead of one. You simply cannot do pagination and restrictions on collection members in hibernate in a single criteria query. Once you have accepted that fact, you can go ahead and construct your base query.

public List queryCats(Date queryDate) throws Exception {
Criteria criteria = getSession().createCriteria(Cat.class);
criteria.setFirstResult(offset).setMaxResults(num);
criteria.addOrder(Order.asc("age"));
if (queryDate != null) {
Criteria subCrit = criteria.createCriteria("kittens", "kitten");
subCrit.add(Expression.ge("birthDate", queryDate));
}

What you need to do then is to project the result to a collection of ids of the objects that match the query, eliminating duplicates. Here also the mentioned speciality of Postgres (at a second look it seems to be a general limitation) comes in: you have to include fields that are used for ordering in the projection clause. The nice thing is we can do the projection on demand and only if collection memebers are actually queried — this comes in handy for programatically constructed queries which can use the simple, single query way if collection properties are not used in the query:

  if (queryDate != null) {
// project result to distinct ids,
// including columns used for sorting
criteria.setProjection(Projections.distinct(
Projections.projectionList()
.add(Projections.id())
.add(Projections.property("birthDate"))
));
List list = criteria.list();

// unfortunately we have to copy the ids out of the
// resulting Object[] List
List idlist = new ArrayList<long>();

for (Iterator iditer = list.iterator(); iditer.hasNext();) {
Object[] record = (Object[]) iditer.next();
idlist.add((Long)record[0]);
}

// another Hibernate stupidity: empty Lists cause
// Expression.in to throw an error
if (idlist.size() > 0) {
criteria = getSession().createCriteria(Cat.class);
criteria.add(Expression.in("id", idlist));
}
else {
return new ArrayList();
}
}

You can then transparently use the id-query in the complex case or issue the main query in the simple case:

  return criteria.list();
}

The presented approach has worked for me in a number of cases, and I am very happy now!

29 comments:

Peter Centgraf said...

I ran into the same problem and implemented a very similar solution a few weeks ago (perhaps at the same time you were writing this post). I can confirm that this works fairly well even with large numbers of joined tables. There is a fundamental mismatch between the DB's view of what "max results" means and what the Java entity worldview expects. I don't know of any way to avoid the two-stage approach, in Hibernate or otherwise.

Anonymous said...

I think your complaint about hibernate is a bit unfair, as you are simply seeing the limitations of SQL, which Hibernate uses. I don't think a second SQL query is a good solution if you want pagination.

To be honest, I have not looked closely at your solution (I prefer HQL anyway), but I think there might be another solution to your problem: use a subselect instead of a join. That is, if your result depends on some property of a collection in your root object that you can not join, formulate it like that:

where exists (select rootObject.subCollection as subcollection where subcollection has whateverproperty)

I never could get subselects to work with criteria (something with "DetachedCriteria"?), hence my move to HQL...

Flo said...

Hi anonymous,

You're absolutely right, subselects would be the correct solution, but I also failed in forcing subselect fetching in my use case - so I guess it is broken or at least not correctly documented in the Criteria API, and this should be another thing to rant about... ;)

Bjoern said...

I don't think there is much that can be done about the fetching (in your example, there will still be subsequent selects to fetch for the collections, if I am not mistaken - you can't have both fetching and nice pagination). But with the subselect method, you can avoid the second query (converting ids into objects), and I think the database has a fair chance of optimizing the query with a subselect.

There is also another issue with your proposed solution: at least on some databases, there is a limit to the amount of id's you can put into a "where blabla in (...) " criterion. If you always use pagination with sufficiently few items on the page, it might not be an issue, but I wanted to warn you about it anyway ;-)

Flo said...

Bjorn,
Thanks for your comment - as I said above, I tried subselect fetching but I could not get it working in this case - just seemed to have no effect. When I come by a similar problem next time, I will give it another shot though.

Thanks for the remark about the possible limitation of the "in" criterion, in my case page size is pretty limited but it may help other readers.

Anonymous said...

How cab you specify descending order in the projection sort?

Anonymous said...

Excuse my typo (can*). I realise now that the sort is done with the criteria.addOrder and that you just need to add the sort clauses in the projection. But what you are missing is the same sort in the (in) query. Once you add it there, this code is perfect for pagination. Thanks!

Anonymous said...

Did you mean to create the second Criteria based on Story.class or should that be Cats.class. I'm not sure where Story.class fits here.

Thanks

Flo said...

Thanks for pointing out that error, anonymous. Of course it should be Cat.class - this is now fixed.

Anonymous said...

What about applying a result transformer, for example

mycriteria.setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY);

Flo said...

Hi Anonymous!

This is what this article is about, that DISTINCT_ROOT_ENTITY works opnly as long as you retrieve all the result items, not if you use setFirstResult() and/or setMaxResults() - in this case DISTINCT_ROOT_ENTITY is applied after the result limitation, which means that you may end up with far less result items than you need for the result limiting to work correctly.

Anonymous said...

I didn't understand this comment: "But what you are missing is the same sort in the (in) query. Once you add it there, this code is perfect for pagination."

Where should I add the sort criteria?

Thank you!

Marc said...

This is unbelievably inefficient. I just can't believe this super trivial functionality is not in Hibernate. I see this issue going back years. It's a big issue, it's trivial to solve. Why is it still around?

Flo said...

Hi marc, it indeed is ;) When doing my research, I was stunned by how long this issue was being discussed already and I generally had the impression that the criteria API was kind of "dead" and most people were using HQL anyways. Which is a shame, because IMHO the API would be the much more powerful concept but I guess it doesn't appeal to newbies and/or people coming from SQL as much. I do not use Hibernate any more, so I cannot tell about the current situation...

Anonymous said...

Thanks for this tip. It helped me figure out a weird bug.

If you're not using Hibernate anymore, what are you using for ORM, if anything?

prathi said...

The solution presented here is a bit complicated as the user have to deal with copying the ids (additional memory usage), firing a new query and dealing with empty lists.

A better solution is to use subquery for collection type or use batch-size attribute (for both collection type or single-point association). You can find a very good article on this at - http://www.javalobby.org/java/forums/t33002.html

Query query = session.createQuery(queryString);
query.setFirstResult(0);
query.setMaxResults(2);
List Pet l = query.list();

Vilo said...

You should write somehow like this:

public List queryCats(Date queryDate) throws Exception {
Criteria criteria = getSession().createCriteria(Cat.class, "c");
criteria.setFirstResult(offset).setMaxResults(num);
criteria.addOrder(Order.asc("age"));
if (queryDate != null) {
DetachedCriteria dc = DetachedCriteria.forClass(Kitten.class, "k")
.add(Restrictions.eqProperty("k.id", "c.kittens"))
.add(Restrictions.ge("birthDate", queryDate))
.setProjection(Projections.id());
criteria.add(Subqueries.exists(dc));
}
}

curmudgeon99@gmail.com said...

Your comments made me want to rise from my desk and applaud, from the crappy documentation that avoids the tough issues and the arrogant tone of anyone involved from Hibernate, especially Gavin and Christian. Bravo to you!

Ali said...

I think it works:
criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);

justin said...

I know this is old, but I'm very interested in knowing how this works...

When you get a maxResult(100) limited set of cats, you're not getting a full 100 distinct cats, which makes sense because of how hibernate works and the limitation that you explain...

the confusion I have is: why does adding a projection all of a sudden magically make it return a full 100 distinct set, wouldn't you expect that the exact same problem would exist?

I did actually run similar code and what you did works... but I can't figure out why

Roadrunner said...

Hi!

This post helped me a lot and I was able to solve this problem with just 1 SQL statement (using subquery). What I did:

1) Create the criteria used for applying conditions:

Criteria searchCriteria = getSession().createCriteria(Cat.class);

2) Add all conditions to searchCriteria:

if (queryDate != null) {
searchCriteria.createAlias("kittens", "k");
searchCriteria.add(Restrictions.ge("birthDate", queryDate));
}

3) Set projection of searchCriteria to the distinct clause of Your choice (mine was ID only):

searchCriteria.setProjection(Projections.distinct(Projections.id()));

4) Create another criteria to be used for select:

DetachedCriteria selectCriteria = DetachedCriteria.forClass(Cat.class);

5) Add subquery restriction:

selectCriteria.add(Subqueries.propertyIn("id", searchCriteria));

6) Add any sort conditions to selectCriteria:

selectCriteria.addOrder(Order.asc("age"));

7) Apply pagination to selectCriteria:

selectCriteria.setFirstResult(offset).setMaxResults(num);

8) Run the selectCriteria for distinct and paginated result set:

List list = selectCriteria.list();

That's it!
Thanks for this post!

Anonymous said...

@Roadrunner

I liked your solution but just an FYI for anyone who will be using it: the Criteria should be a DetachedCriteria and the DetachedCriteria should be a Criteria for Roadrunner's solution to work. This is mainly because there is no setFirstResult or setMaxResults for a DetachedCriteria.

Markos said...

Thanks for this article.

Anonymous said...

@Flo & @Vilo
Thanks a lot. You guys helped me a lot in saving my time on this issue.

Dustin Wilhelmi said...

Thanks, this post helped enormously in a project I am working on.

Thomas Werzmirzowsky said...

I have a very similar problem but i think that your solutation will not work under all circumstances. If you filter and order by a collection property the order from the "get the ids"-criteria might be different from the "fetching"-criteria. So you would have to order the result manually according to the result of the "get the ids"-criteria. Is this true or do i get anything wrong?

Anonymous said...

Love it. Five years later and this just helped me. And the was just as relevant today. Couldn't agree more. Thx.

Anonymous said...

Great post, I don't understand why hibernate haven't added support for distinct select:s.

They could e.g. expand the c.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY) to include a second argument, namely the property to use as the distinct column.

Anyway, roadrunners solution seems to work for now.

Anonymous said...

I have 3 tables that I need to join: TableA, TableB and TableC. TableA is totally standalone, but TableB and TableC are related via one-to-many mapping (yay!)
Here's my working join using Criteria API.
DetachedCriteria tableBnCJoin= DetachedCriteria.forClass(TableB.class);
tableBnCJoin.setProjection(Projections.property("tableBEmployeeId"));

Criteria tableACriteria = getSession().createCriteria(TableA.class);
criteria.add(org.hibernate.criterion.Property.forName("tableAEmployeeId").in(tableBnCJoin));
criteria.setProjection(Projections.projectionList()
.add(Projections.distinct(Projections.property("tableAEmployeeId")))
.add(Projections.property("tableAEmployeeName")));

That all works well, but now I need to return TableCEmployeeAddress from TableC.
How do I add the column TableCEmployeeAddress returned? Do I add the projection to tabBnCJoin criteria or tableACriteria?