PG::Error: SELECT DISTINCT, ORDER BY expressions must appear in select list

I know this is a rather old question, but I just went through a small example in my head which helped me understand why Postgres has this seemingly odd restriction on SELECT DISTINCT / ORDER BY columns.

Imagine you have the following data in your Rsvp table:

 event_id |        start_time
----------+------------------------
    0     | Mar 17, 2013  12:00:00
    1     |  Jan 1, 1970  00:00:00
    1     | Aug 21, 2013  16:30:00
    2     |  Jun 9, 2012  08:45:00

Now you want to grab a list of distinct event_ids, ordered by their respective start_times. But where should 1 go? Should it come first, because the one tuple starts on Jan 1, 1970, or should it go last because of the Aug 21, 2013?

As the database system can’t make that decision for you and the syntax of the query can’t depend on the actual data it might be operating on (assuming event_id is unique), we are restricted to ordering only by columns from the SELECT clause.

As for the actual question – an alternative to Matthew’s answer is using an aggregate function like MIN or MAX for the sorting:

  SELECT event_id
    FROM Rsvp
GROUP BY event_id
ORDER BY MIN(start_time)

The explicit grouping and aggregation on start_time permit the database to come up with a unambiguous ordering of the result tuples. Note however, that readability is definitely an issue in this case 😉

Leave a Comment