Optimize groupwise maximum query

Assuming relatively few rows in options for many rows in records.

Typically, you would have a look-up table options that is referenced from records.option_id, ideally with a foreign key constraint. If you don’t, I suggest to create one to enforce referential integrity:

CREATE TABLE options (
  option_id int  PRIMARY KEY
, option    text UNIQUE NOT NULL
);

INSERT INTO options
SELECT DISTINCT option_id, 'option' || option_id -- dummy option names
FROM   records;

Then there is no need to emulate a loose index scan any more and this becomes very simple and fast. Correlated subqueries can use a plain index on (option_id, id).

SELECT option_id, (SELECT max(id)
                   FROM   records
                   WHERE  option_id = o.option_id) AS max_id
FROM   options o
ORDER  BY 1;

This includes options with no match in table records. You get NULL for max_id and you can easily remove such rows in an outer SELECT if needed.

Or (same result):

SELECT option_id, (SELECT id
                   FROM   records
                   WHERE  option_id = o.option_id
                   ORDER  BY id DESC NULLS LAST
                   LIMIT  1) AS max_id
FROM   options o
ORDER  BY 1;

May be slightly faster. The subquery uses the sort order DESC NULLS LAST – same as the aggregate function max() which ignores NULL values. Sorting just DESC would have NULL first:

Why do NULL values come first when ordering DESC in a PostgreSQL query?

The perfect index for this:

CREATE INDEX on records (option_id, id DESC NULLS LAST);

Index sort order doesn’t matter much while columns are defined NOT NULL.

There can still be a sequential scan on the small table options, that’s just the fastest way to fetch all rows. The ORDER BY may bring in an index (only) scan to fetch pre-sorted rows.
The big table records is only accessed via (bitmap) index scan or, if possible, index-only scan.

db<>fiddle here – showing two index-only scans for the simple case
_{Old sqlfiddle}

Or use LATERAL joins for a similar effect in Postgres 9.3+:

Optimize GROUP BY query to retrieve latest row per user

More Related Contents:

Leave a Comment Cancel reply