get first and last values in a groupby

Option 1 def first_last(df): return df.ix[[0, -1]] df.groupby(level=0, group_keys=False).apply(first_last) Option 2 – only works if index is unique idx = df.index.to_series().groupby(level=0).agg([‘first’, ‘last’]).stack() df.loc[idx] Option 3 – per notes below, this only makes sense when there are no NAs I also abused the agg function. The code below works, but is far uglier. df.reset_index(1).groupby(level=0).agg([‘first’, ‘last’]).stack() \ … Read more

Conversion failed when converting the varchar value ‘simple, ‘ to data type int

In order to avoid such error you could use CASE + ISNUMERIC to handle scenarios when you cannot convert to int. Change CONVERT(INT, CONVERT(VARCHAR(12), a.value)) To CONVERT(INT, CASE WHEN IsNumeric(CONVERT(VARCHAR(12), a.value)) = 1 THEN CONVERT(VARCHAR(12),a.value) ELSE 0 END) Basically this is saying if you cannot convert me to int assign value of 0 (in my … Read more

Pandas aggregate count distinct

How about either of: >>> df date duration user_id 0 2013-04-01 30 0001 1 2013-04-01 15 0001 2 2013-04-01 20 0002 3 2013-04-02 15 0002 4 2013-04-02 30 0002 >>> df.groupby(“date”).agg({“duration”: np.sum, “user_id”: pd.Series.nunique}) duration user_id date 2013-04-01 65 2 2013-04-02 45 1 >>> df.groupby(“date”).agg({“duration”: np.sum, “user_id”: lambda x: x.nunique()}) duration user_id date 2013-04-01 65 … Read more

SQL query to group by day

if you’re using SQL Server, dateadd(DAY,0, datediff(day,0, created)) will return the day created for example, if the sale created on ‘2009-11-02 06:12:55.000’, dateadd(DAY,0, datediff(day,0, created)) return ‘2009-11-02 00:00:00.000’ select sum(amount) as total, dateadd(DAY,0, datediff(day,0, created)) as created from sales group by dateadd(DAY,0, datediff(day,0, created))

How to create Pandas groupby plot with subplots

Here’s an automated layout with lots of groups (of random fake data) and playing around with grouped.get_group(key) will show you how to do more elegant plots. import pandas as pd from numpy.random import randint import matplotlib.pyplot as plt df = pd.DataFrame(randint(0,10,(200,6)),columns=list(‘abcdef’)) grouped = df.groupby(‘a’) rowlength = grouped.ngroups/2 # fix up if odd number of groups … Read more