DISTINCT
is often applied to repair queries that are rotten from the inside, and that’s often expensive and / or incorrect. Don’t multiply rows to begin with, then you don’t have to fold unwanted duplicates at the end.
Joining to multiple n-tables (“has many”) multiplies rows in the result set. That’s efectively a CROSS JOIN
or Cartesian product by proxy. See:
There are various ways to avoid this mistake.
Aggregate first, join later
Technically, the query works as long as you join to one table with multiple rows at a time before you aggregate:
SELECT e.id, e.name, e.age, e.streets, array_agg(wd.day) AS days
FROM (
SELECT e.id, e.name, e.age, array_agg(ad.street) AS streets
FROM employees e
JOIN address ad ON ad.employeeid = e.id
GROUP BY e.id -- PK covers whole row
) e
JOIN workingdays wd ON wd.employeeid = e.id
GROUP BY e.id, e.name, e.age;
It’s best to include the primary key id
and GROUP BY
it, because name
and age
are not necessarily unique. Else you might merge employees by mistake.
But better aggregate in a subquery before the join, that’s superior without selective WHERE
conditions on employees
:
SELECT e.id, e.name, e.age, ad.streets, array_agg(wd.day) AS days
FROM employees e
JOIN (
SELECT employeeid, array_agg(ad.street) AS streets
FROM address
GROUP BY 1
) ad ON ad.employeeid = e.id
JOIN workingdays wd ON e.id = wd.employeeid
GROUP BY e.id, ad.streets;
Or aggregate both:
SELECT name, age, ad.streets, wd.days
FROM employees e
JOIN (
SELECT employeeid, array_agg(ad.street) AS streets
FROM address
GROUP BY 1
) ad ON ad.employeeid = e.id
JOIN (
SELECT employeeid, array_agg(wd.day) AS days
FROM workingdays
GROUP BY 1
) wd ON wd.employeeid = e.id;
The last one is typically faster if you retrieve all or most of the rows in the base tables.
Note that using JOIN
and not LEFT JOIN
removes employees from the result that have no row in address
or none in workingdays
. That may or may not be intended. Switch to LEFT JOIN
to retain all employees in the result.
Correlated subqueries / JOIN LATERAL
For selective filters on employees
, consider correlated subqueries instead:
SELECT name, age
, (SELECT array_agg(street) FROM address WHERE employeeid = e.id) AS streets
, (SELECT array_agg(day) FROM workingdays WHERE employeeid = e.id) AS days
FROM employees e
WHERE e.namer="peter"; -- very selective
Or LATERAL
joins in Postgres 9.3 or later:
SELECT e.name, e.age, a.streets, w.days
FROM employees e
LEFT JOIN LATERAL (
SELECT array_agg(street) AS streets
FROM address
WHERE employeeid = e.id
GROUP BY 1
) a ON true
LEFT JOIN LATERAL (
SELECT array_agg(day) AS days
FROM workingdays
WHERE employeeid = e.id
GROUP BY 1
) w ON true
WHERE e.name="peter"; -- very selective
The last two queries retain all qualifying employees in the result.