How does PostgreSQL perform ORDER BY with a b-tree index on the field?

For a simple query like this Postgres will use an index scan and retrieve readily sorted tuples from the index in order. Due to its MVCC model Postgres had to always visit the “heap” (data pages) additionally to verify entries are actually visible to the current transaction. Quoting the Postgres Wiki on index-only scans:

PostgreSQL indexes do not contain visibility information. That is, it
is not directly possible to ascertain if any given tuple is visible to
the current transaction, which is why it has taken so long for index-only
scans to be implemented.

Which finally happened in version 9.2: index-only scans. The manual:

If the index stores the original indexed data values (and not some
lossy representation of them), it is useful to support index-only scans, in which the index returns the actual data not just the TID of
the heap tuple. This will only avoid I/O if the visibility map shows
that the TID is on an all-visible page; else the heap tuple must be
visited anyway to check MVCC visibility. But that is no concern of the
access method’s.

The visibility map decides whether index-only scans are possible. Only an option if all involved column values are included in the index. Else, the heap has to be visited (additionally) in any case. The sort step is still not needed.

That’s why we sometimes append otherwise useless columns to indexes now. Like the data column in your example:

CREATE INDEX ON bsort (a, data);  -- btree is the default index type

It makes the index bigger (depends) and a bit more expensive to maintain and use for other purposes. So only append the data column if you get index-only scans out of it. The order of columns in the index is important:

Since Postgres 11, there are also “covering indexes” with the INCLUDE keyword. Like:

CREATE INDEX ON bsort (a) INCLUDE (data);

See:

The benefit of an index-only scan, per documentation:

If it’s known that all tuples on the page are visible, the heap fetch
can be skipped. This is most noticeable on large data sets where the
visibility map can prevent disk accesses. The visibility map is vastly
smaller than the heap, so it can easily be cached even when the heap
is very large.

The visibility map is maintained by VACUUM which happens automatically if you have autovacuum running (the default setting in modern Postgres). Details:

But there is some delay between write operations to the table and the next VACUUM run. The gist of it:

  • Read-only tables stay ready for index-only scans once vacuumed.
  • Data pages that have been modified lose their “all-visible” flag in the visibility map until the next VACUUM (and all older tansactions being finished), so it depends on the ratio between write operations and VACUUM frequency.

Partial index-only scans are still possible if some of the involved pages are marked all-visible. But if the heap has to be visited anyway, the access method “index scan” is a bit cheaper. So if too many pages are currently dirty, Postgres will switch to the cheaper index scan altogether. The Postgres Wiki again:

As the number of heap fetches (or “visits”) that are projected to be
needed by the planner goes up, the planner will eventually conclude
that an index-only scan isn’t desirable, as it isn’t the cheapest
possible plan according to its cost model. The value of index-only
scans lies wholly in their potential to allow us to elide heap access
(if only partially) and minimise I/O.

Leave a Comment