How does one insert statistical annotations (stars or p-values) into matplotlib / seaborn plots?

Here how to add statistical annotation to a Seaborn box plot: import seaborn as sns, matplotlib.pyplot as plt tips = sns.load_dataset(“tips”) sns.boxplot(x=”day”, y=”total_bill”, data=tips, palette=”PRGn”) # statistical annotation x1, x2 = 2, 3 # columns ‘Sat’ and ‘Sun’ (first column: 0, see plt.xticks()) y, h, col = tips[‘total_bill’].max() + 2, 2, ‘k’ plt.plot([x1, x1, x2, … Read more

git find fat commit

You could do this: git ls-tree -r -t -l –full-name HEAD | sort -n -k 4 This will show the largest files at the bottom (fourth column is the file (blob) size. If you need to look at different branches you’ll want to change HEAD to those branch names. Or, put this in a loop … Read more

T-test in Pandas

it depends what sort of t-test you want to do (one sided or two sided dependent or independent) but it should be as simple as: from scipy.stats import ttest_ind cat1 = my_data[my_data[‘Category’]==’cat1′] cat2 = my_data[my_data[‘Category’]==’cat2′] ttest_ind(cat1[‘values’], cat2[‘values’]) >>> (1.4927289925706944, 0.16970867501294376) it returns a tuple with the t-statistic & the p-value see here for other t-tests … Read more

Algorithm for sampling without replacement?

Here’s some code for sampling without replacement based on Algorithm 3.4.2S of Knuth’s book Seminumeric Algorithms. void SampleWithoutReplacement ( int populationSize, // size of set sampling from int sampleSize, // size of each sample vector<int> & samples // output, zero-offset indicies to selected items ) { // Use Knuth’s variable names int& n = sampleSize; … Read more