aggregate-functions - w3toppers.com

Count cumulative total in Postgresql

With larger datasets, window functions are the most efficient way to perform these kinds of queries — the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. 🙂 PostgreSQL 8.4 and up have support for window functions. This is what it … Read more

SQL: difference between PARTITION BY and GROUP BY

They’re used in different places. GROUP BY modifies the entire query, like: select customerId, count(*) as orderCount from Orders group by customerId But PARTITION BY just works on a window function, like ROW_NUMBER(): select row_number() over (partition by customerId order by orderId) as OrderNumberForThisCustomer from Orders GROUP BY normally reduces the number of rows returned … Read more

Postgres window function and group by exception

You are not, in fact, using aggregate functions. You are using window functions. That’s why PostgreSQL demands sp.payout and s.buyin to be included in the GROUP BY clause. By appending an OVER clause, the aggregate function sum() is turned into a window function, which aggregates values per partition while keeping all rows. You can combine … Read more

Custom aggregate function (concat) in SQL Server

You cannot write custom aggregates outside of the CLR. The only type of functions you can write in pure T-SQL are scalar and table valued functions. Compare the pages for CREATE AGGREGATE, which only lists CLR style options, with CREATE FUNCTION, which shows T-SQL and CLR options.

Multiple array_agg() calls in a single query

DISTINCT is often applied to repair queries that are rotten from the inside, and that’s often expensive and / or incorrect. Don’t multiply rows to begin with, then you don’t have to fold unwanted duplicates at the end. Joining to multiple n-tables (“has many”) multiplies rows in the result set. That’s efectively a CROSS JOIN … Read more

Aggregate a single column in query with many columns

Simple query This can be much simpler with PostgreSQL 9.1 or later. As explained in this closely related answer: PGError: ERROR: aggregates not allowed in WHERE clause on a AR query of an object and its has_many objects It is enough to GROUP BY the primary key of a table. Since: foo1 is a primary … Read more

MySQL dynamic cross tab

The number and names of columns must be fixed at the time you prepare the query. That’s just the way SQL works. So you have two choices of how to solve this. Both choices involve writing application code: (1) Query the distinct values of way and then write code to use these to construct the … Read more

How to find mean of grouped Vector columns in Spark SQL?

Spark >= 2.4 You can use Summarizer: import org.apache.spark.ml.stat.Summarizer val dfNew = df.as[(Int, org.apache.spark.mllib.linalg.Vector)] .map { case (group, v) => (group, v.asML) } .toDF(“group”, “features”) dfNew .groupBy($”group”) .agg(Summarizer.mean($”features”).alias(“means”)) .show(false) +—–+——————————————————————–+ |group|means | +—–+——————————————————————–+ |1 |[8.740630742016827E12,2.6124956666260462E14,3.268714653521495E14] | |6 |[2.1153266920139112E15,2.07232483974322592E17,6.2715161747245427E17]| |3 |[6.3781865566442836E13,8.359124419656149E15,1.865567821598214E14] | |5 |[4.270201403521642E13,6.561211706745676E13,8.395448246737938E15] | |9 |[3.577032684241448E16,2.5432362841314468E16,2.3744826986293008E17] | |4 |[2.339253775419023E14,8.517531902022505E13,3.055115780965264E15] | |8 |[8.029924756674456E15,7.284873600992855E17,3.08621303029924E15] | |7 |[3.2275104122699105E15,7.5472363442090208E16,7.022556624056291E14] … Read more