Aggregating (x,y) coordinate point clouds in PostgreSQL

Use the often overlooked built-in function width_bucket() in combination with your aggregation:

If your coordinates run from, say, 0 to 2000 and you want to consolidate everything within squares of 5 to single points, I would lay out a grid of 10 (5*2) like this:

SELECT device_id
     , width_bucket(pos_x, 0, 2000, 2000/10) * 10 AS pos_x
     , width_bucket(pos_y, 0, 2000, 2000/10) * 10 AS pos_y
     , count(*) AS ct -- or any other aggregate
FROM   tbl
GROUP  BY 1,2,3
ORDER  BY 1,2,3;

To minimize the error you could GROUP BY the grid as demonstrated, but save actual average coordinates:

SELECT device_id
     , avg(pos_x)::int AS pos_x   -- save actual averages to minimize error
     , avg(pos_y)::int AS pos_y   -- cast if you need to
     , count(*)        AS ct      -- or any other aggregate
FROM   tbl
GROUP  BY
       device_id
     , width_bucket(pos_x, 0, 2000, 2000/10) * 10  -- aggregate by grid
     , width_bucket(pos_y, 0, 2000, 2000/10) * 10
ORDER  BY 1,2,3;

sqlfiddle demonstrating both alongside.

Well, this particular case could be simpler:

...
GROUP  BY
       device_id
     , (pos_x / 10) * 10          -- truncates last digit of an integer
     , (pos_y / 10) * 10
...

But that’s just because the demo grid size of 10 conveniently matches the decimal system. Try the same with a grid size of 17 or something …


Expand to timestamps

You can expand this approach to cover date and timestamp values by converting them to unix epoch (number of seconds since ‘1970-1-1’) with extract().

SELECT extract(epoch FROM '2012-10-01 21:06:38+02'::timestamptz);

When you are done, convert the result back to timestamp with time zone:

SELECT timestamptz 'epoch' + 1349118398 * interval '1s';

Or simply to_timestamp():

SELECT to_timestamp(1349118398);

Leave a Comment