Identifying statistical outliers with pandas: groupby and reduce rows into different dataframe

get the mean and std.
We need to loop over each column, get the mean and std, then set the max and min value we accept for this column.

# Storring mean and std for every col as a tuple, 0 index for max value,
# and 1 for min value
outliers = []
for col in df.columns:
   mean = np.mean(df[col].values)
   std = np.std(df[col].std)
   # You can play with the max and min below !
   outliers.append((mean + std, mean - std))
# Then you have the list of tuples, with each tuple representing the max and min value you accept the column (index related).

Leave a Comment