Humanized or natural number sorting of mixed word-and-number strings

Building on your test data, but this works with arbitrary data. This works with any number of elements in the string.

Register a composite type made up of one text and one integer value once per database. I call it ai:

CREATE TYPE ai AS (a text, i int);

The trick is to form an array of ai from each value in the column.

regexp_matches() with the pattern (\D*)(\d*) and the g option returns one row for every combination of letters and numbers. Plus one irrelevant dangling row with two empty strings '{"",""}' Filtering or suppressing it would just add cost. Aggregate this into an array, after replacing empty strings ('') with 0 in the integer component (as '' cannot be cast to integer).

NULL values sort first – or you have to special case them – or use the whole shebang in a STRICT function like @Craig proposes.

Postgres 9.4 or later

SELECT data
FROM   alnum
ORDER  BY ARRAY(SELECT ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai
                FROM regexp_matches(data, '(\D*)(\d*)', 'g') x)
        , data;

db<>fiddle here

Postgres 9.1 (original answer)

Tested with PostgreSQL 9.1.5, where regexp_replace() had a slightly different behavior.

SELECT data
FROM  (
    SELECT ctid, data, regexp_matches(data, '(\D*)(\d*)', 'g') AS x
    FROM   alnum
    ) x
GROUP  BY ctid, data   -- ctid as stand-in for a missing pk
ORDER  BY regexp_replace (left(data, 1), '[0-9]', '0')
        , array_agg(ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai)
        , data         -- for special case of trailing 0

Add regexp_replace (left(data, 1), '[1-9]', '0') as first ORDER BY item to take care of leading digits and empty strings.

If special characters like {}()"', can occur, you’d have to escape those accordingly.
@Craig’s suggestion to use a ROW expression takes care of that.


BTW, this won’t execute in sqlfiddle, but it does in my db cluster. JDBC is not up to it. sqlfiddle complains:

Method org.postgresql.jdbc3.Jdbc3Array.getArrayImpl(long,int,Map) is
not yet implemented.

This has since been fixed: http://sqlfiddle.com/#!17/fad6e/1

Leave a Comment