Python pandas groupby aggregate on multiple columns, then pivot

df.groupby(‘Category’).agg({‘Item’:’size’,’shop1′:[‘sum’,’mean’,’std’],’shop2′:[‘sum’,’mean’,’std’],’shop3′:[‘sum’,’mean’,’std’]}) Or if you want it across all shops then: df1 = df.set_index([‘Item’,’Category’]).stack().reset_index().rename(columns={‘level_2′:’Shops’,0:’costs’}) df1.groupby(‘Category’).agg({‘Item’:’size’,’costs’:[‘sum’,’mean’,’std’]})

How to pivot on multiple columns in Spark SQL?

Here’s a non-UDF way involving a single pivot (hence, just a single column scan to identify all the unique dates). dff = mydf.groupBy(‘id’).pivot(‘day’).agg(F.first(‘price’).alias(‘price’),F.first(‘units’).alias(‘unit’)) Here’s the result (apologies for the non-matching ordering and naming): +—+——-+——+——-+——+——-+——+——-+——+ | id|1_price|1_unit|2_price|2_unit|3_price|3_unit|4_price|4_unit| +—+——-+——+——-+——+——-+——+——-+——+ |100| 23| 10| 45| 11| 67| 12| 78| 13| |101| 23| 10| 45| 13| 67| 14| 78| 15| … Read more

Crosstab with a large or undefined number of categories

create table vote (Photo integer, Voter text, Decision text); insert into vote values (1, ‘Alex’, ‘Cat’), (1, ‘Bob’, ‘Dog’), (1, ‘Carol’, ‘Cat’), (1, ‘Dave’, ‘Cat’), (1, ‘Ed’, ‘Cat’), (2, ‘Alex’, ‘Cat’), (2, ‘Bob’, ‘Dog’), (2, ‘Carol’, ‘Cat’), (2, ‘Dave’, ‘Cat’), (2, ‘Ed’, ‘Dog’), (3, ‘Alex’, ‘Horse’), (3, ‘Bob’, ‘Horse’), (3, ‘Carol’, ‘Dog’), (3, ‘Dave’, ‘Horse’), … Read more

Flattening of a 1 row table into a key-value pair table

A version where there is no dynamic involved. If you have column names that is invalid to use as element names in XML this will fail. select T2.N.value(‘local-name(.)’, ‘nvarchar(128)’) as [Key], T2.N.value(‘text()[1]’, ‘nvarchar(max)’) as Value from (select * from TableA for xml path(”), type) as T1(X) cross apply T1.X.nodes(‘/*’) as T2(N) A working sample: declare … Read more

SQL SERVER PIVOT table with joins and dynamic columns

Since you want to transform data from rows into columns, then you will want to use the PIVOT function. If you have a limited number or known values, then you can hard-code the query: select plan_id, [2012, November], [2012, December], [2013, January], [2013, February] from ( SELECT b.plan_id, (Convert(varchar(4),b.run_year) + ‘, ‘ + DateName(month,CAST(‘1900-‘ + … Read more

Join two tables (with a 1-M relationship) where the second table needs to be ‘flattened’ into one row

select s.id,s.name, max(case when e.course_id = 55 then complete else null end) as c55, max(case when e.course_id = 66 then complete else null end) as c66, max(case when e.course_id = 77 then complete else null end) as c77 from student as s left join enrollment as e on s.id = e.student_id group by s.id @Chris. … Read more