How to calculate correlation between all columns and remove highly correlated ones using pandas?
The method here worked well for me, only a few lines of code: https://chrisalbon.com/machine_learning/feature_selection/drop_highly_correlated_features/ import numpy as np # Create correlation matrix corr_matrix = df.corr().abs() # Select upper triangle of correlation matrix upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(np.bool)) # Find features with correlation greater than 0.95 to_drop = [column for column in upper.columns if any(upper[column] > 0.95)] … Read more