MySQL GROUP BY behavior

MySQL chooses a row arbitrarily. In practice, commonly used MySQL storage engines return the values from the first row in the group, with respect to the physical storage.

create table foo (id serial primary key, category varchar(10));

insert into foo (category) values 
  ('foo'), ('foo'), ('foo'), ('bar'), ('bar'), ('bar');

select * from foo group by category;

+----+----------+
| id | category |
+----+----------+
|  4 | bar      |
|  1 | foo      |
+----+----------+

Other folks are correct that MySQL allows you to run this query even though it has arbitrary and potentially misleading results. The SQL standard, and most other RDBMS vendors, disallow this kind of ambiguous GROUP BY query. This is called the Single-Value Rule: all columns in the select-list must be explicitly part of the GROUP BY criteria, or else inside an aggregate function, e.g. COUNT(), MAX(), etc.

MySQL supports a SQL mode ONLY_FULL_GROUP_BY that makes MySQL return an error if you try to run a query that violates SQL standard semantics.

AFAIK, SQLite is the only other RDBMS that allows ambiguous columns in a grouped query. SQLite returns values from the last row in the group:

select * from foo group by category;

6|bar
3|foo

We can imagine queries that would not be ambiguous, yet still violate the SQL standard semantics.

SELECT foo.*, parent_of_foo.* 
FROM foo JOIN parent_of_foo 
  ON (foo.parent_id = parent_of_foo.parent_id) 
GROUP BY foo_id;

There’s no logical way this could produce ambiguous results. Each row in foo gets its own group, if we GROUP BY the primary key of foo. So any column from foo can have only one value in the group. Even joining to another table referenced by a foreign key in foo can have only one value per group, if the groups are defined by the primary key of foo.

MySQL and SQLite trust you to design logically unambiguous queries. Formally, every column in the select-list must be a functional dependency of the columns in the GROUP BY criteria. If you don’t adhere to this, it’s your fault. 🙂

Standard SQL is more strict and disallows some queries that could be unambiguous–probably because it would be too complex for the RDBMS to be sure in general.

Leave a Comment