sql group by versus distinct

GROUP BY maps groups of rows to one row, per distinct value in specific columns, which don’t even necessarily have to be in the select-list.

SELECT b, c, d FROM table1 GROUP BY a;

This query is legal SQL (correction: only in MySQL; actually it’s not standard SQL and not supported by other brands). MySQL accepts it, and it trusts that you know what you’re doing, selecting b, c, and d in an unambiguous way because they’re functional dependencies of a.

However, Microsoft SQL Server and other brands don’t allow this query, because it can’t determine the functional dependencies easily. edit: Instead, standard SQL requires you to follow the Single-Value Rule, i.e. every column in the select-list must either be named in the GROUP BY clause or else be an argument to a set function.

Whereas DISTINCT always looks at all columns in the select-list, and only those columns. It’s a common misconception that DISTINCT allows you to specify the columns:

SELECT DISTINCT(a), b, c FROM table1;

Despite the parentheses making DISTINCT look like function call, it is not. It’s a query option and a distinct value in any of the three fields of the select-list will lead to a distinct row in the query result. One of the expressions in this select-list has parentheses around it, but this won’t affect the result.

Leave a Comment