Multiple array_agg() calls in a single query

DISTINCT is often applied to repair queries that are rotten from the inside, and that’s often expensive and / or incorrect. Don’t multiply rows to begin with, then you don’t have to fold unwanted duplicates at the end.

Joining to multiple n-tables (“has many”) multiplies rows in the result set. That’s efectively a CROSS JOIN or Cartesian product by proxy. See:

There are various ways to avoid this mistake.

Aggregate first, join later

Technically, the query works as long as you join to one table with multiple rows at a time before you aggregate:

SELECT e.id, e.name, e.age, e.streets, array_agg(wd.day) AS days
FROM  (
   SELECT e.id, e.name, e.age, array_agg(ad.street) AS streets
   FROM   employees e 
   JOIN   address  ad ON ad.employeeid = e.id
   GROUP  BY e.id  -- PK covers whole row
   ) e
JOIN   workingdays wd ON wd.employeeid = e.id
GROUP  BY e.id, e.name, e.age;

It’s best to include the primary key id and GROUP BY it, because name and age are not necessarily unique. Else you might merge employees by mistake.

But better aggregate in a subquery before the join, that’s superior without selective WHERE conditions on employees:

SELECT e.id, e.name, e.age, ad.streets, array_agg(wd.day) AS days
FROM   employees e 
JOIN  (
   SELECT employeeid, array_agg(ad.street) AS streets
   FROM   address
   GROUP  BY 1
   ) ad ON ad.employeeid = e.id
JOIN   workingdays wd ON e.id = wd.employeeid
GROUP  BY e.id, ad.streets;

Or aggregate both:

SELECT name, age, ad.streets, wd.days
FROM   employees e 
JOIN  (
   SELECT employeeid, array_agg(ad.street) AS streets
   FROM   address
   GROUP  BY 1
   ) ad ON ad.employeeid = e.id
JOIN  (
   SELECT employeeid, array_agg(wd.day) AS days
   FROM   workingdays
   GROUP  BY 1
   ) wd ON wd.employeeid = e.id;

The last one is typically faster if you retrieve all or most of the rows in the base tables.

Note that using JOIN and not LEFT JOIN removes employees from the result that have no row in address or none in workingdays. That may or may not be intended. Switch to LEFT JOIN to retain all employees in the result.

Correlated subqueries / JOIN LATERAL

For selective filters on employees, consider correlated subqueries instead:

SELECT name, age
    , (SELECT array_agg(street) FROM address WHERE employeeid = e.id) AS streets
    , (SELECT array_agg(day) FROM workingdays WHERE employeeid = e.id) AS days
FROM   employees e
WHERE  e.namer="peter";  -- very selective

Or LATERAL joins in Postgres 9.3 or later:

SELECT e.name, e.age, a.streets, w.days
FROM   employees e
LEFT   JOIN LATERAL (
   SELECT array_agg(street) AS streets
   FROM   address
   WHERE  employeeid = e.id
   GROUP  BY 1
   ) a ON true
LEFT   JOIN LATERAL (
   SELECT array_agg(day) AS days
   FROM   workingdays
   WHERE  employeeid = e.id
   GROUP  BY 1
   ) w ON true
WHERE  e.name="peter";  -- very selective

The last two queries retain all qualifying employees in the result.

Leave a Comment