MySQL – Control which row is returned by a group by

It’s called selecting the group-wise maximum of a column. Here are several different approaches for mysql.

Here’s how I would do it:

SELECT *
FROM (SELECT id, max(version_id) as version_id FROM table GROUP BY id) t1
INNER JOIN table t2 on t2.id=t1.id and t1.version_id=t2.version_id

This will be relatively efficient, though mysql will create a temporary table in memory for the subquery. I assume you already have an index on (id, version_id) for this table.

It’s a deficiency in SQL that you more or less have to use a subquery for this type of problem (semi-joins are another example).

Subqueries are not well optimized in mysql but uncorrelated subqueries aren’t so bad as long as they aren’t so enormous that they will get written to disk rather than memory. Given that in this query only has two ints the subquery could be millions of rows long before that happened but the select * subquery in your first query could suffer from this problem much sooner.

Leave a Comment