Postgres window function and group by exception

You are not, in fact, using aggregate functions. You are using window functions. That’s why PostgreSQL demands sp.payout and s.buyin to be included in the GROUP BY clause.

By appending an OVER clause, the aggregate function sum() is turned into a window function, which aggregates values per partition while keeping all rows.

You can combine window functions and aggregate functions. Aggregations are applied first. I did not understand from your description how you want to handle multiple payouts / buyins per event. As a guess, I calculate a sum of them per event. Now I can remove sp.payout and s.buyin from the GROUP BY clause and get one row per player and event:

SELECT p.name
     , e.event_id
     , e.date
     , sum(sum(sp.payout)) OVER w
     - sum(sum(s.buyin  )) OVER w AS "Profit/Loss" 
FROM   player            p
JOIN   result            r ON r.player_id     = p.player_id  
JOIN   game              g ON g.game_id       = r.game_id 
JOIN   event             e ON e.event_id      = g.event_id 
JOIN   structure         s ON s.structure_id  = g.structure_id 
JOIN   structure_payout sp ON sp.structure_id = g.structure_id
                          AND sp.position     = r.position
WHERE  p.player_id = 17 
GROUP  BY e.event_id
WINDOW w AS (ORDER BY e.date, e.event_id)
ORDER  BY e.date, e.event_id;

In this expression: sum(sum(sp.payout)) OVER w, the outer sum() is a window function, the inner sum() is an aggregate function.

Assuming p.player_id and e.event_id are PRIMARY KEY in their respective tables.

I added e.event_id to the ORDER BY of the WINDOW clause to arrive at a deterministic sort order. (There could be multiple events on the same date.) Also included event_id in the result to distinguish multiple events per day.

While the query restricts to a single player (WHERE p.player_id = 17), we don’t need to add p.name or p.player_id to GROUP BY and ORDER BY. If one of the joins would multiply rows unduly, the resulting sum would be incorrect (partly or completely multiplied). Grouping by p.name could not repair the query then.

I also removed e.date from the GROUP BY clause. The primary key e.event_id covers all columns of the input row since PostgreSQL 9.1.

If you change the query to return multiple players at once, adapt:

...
WHERE  p.player_id < 17  -- example - multiple players
GROUP  BY p.name, p.player_id, e.date, e.event_id  -- e.date and p.name redundant
WINDOW w AS (ORDER BY p.name, p.player_id, e.date, e.event_id)
ORDER  BY p.name, p.player_id, e.date, e.event_id;

Unless p.name is defined unique (?), group and order by player_id additionally to get correct results in a deterministic sort order.

I only kept e.date and p.name in GROUP BY to have identical sort order in all clauses, hoping for a performance benefit. Else, you can remove the columns there. (Similar for just e.date in the first query.)

Leave a Comment