statistics - w3toppers.com

Multivariate polynomial regression with numpy

sklearn provides a simple way to do this. Building off an example posted here: #X is the independent variable (bivariate in this case) X = array([[0.44, 0.68], [0.99, 0.23]]) #vector is the dependent data vector = [109.85, 155.72] #predict is an independent variable for which we’d like to predict the value predict= [0.49, 0.18] #generate … Read more

Best library for statistics in C++? [closed]

Check the links on mathtools.net. The page for statistics libraries for C++ has links. Another page http://www.thefreecountry.com/sourcecode/mathematics.shtml lists few more. Have you checked the ‘R project‘? I think you can call ‘R objects’ from C/C++.

plotting a histogram on a Log scale with Matplotlib

Specifying bins=8 in the hist call means that the range between the minimum and maximum value is divided equally into 8 bins. What is equal on a linear scale is distorted on a log scale. What you could do is specify the bins of the histogram such that they are unequal in width in a … Read more

How does one insert statistical annotations (stars or p-values) into matplotlib / seaborn plots?

Here how to add statistical annotation to a Seaborn box plot: import seaborn as sns, matplotlib.pyplot as plt tips = sns.load_dataset(“tips”) sns.boxplot(x=”day”, y=”total_bill”, data=tips, palette=”PRGn”) # statistical annotation x1, x2 = 2, 3 # columns ‘Sat’ and ‘Sun’ (first column: 0, see plt.xticks()) y, h, col = tips[‘total_bill’].max() + 2, 2, ‘k’ plt.plot([x1, x1, x2, … Read more

git find fat commit

You could do this: git ls-tree -r -t -l –full-name HEAD | sort -n -k 4 This will show the largest files at the bottom (fourth column is the file (blob) size. If you need to look at different branches you’ll want to change HEAD to those branch names. Or, put this in a loop … Read more

T-test in Pandas

it depends what sort of t-test you want to do (one sided or two sided dependent or independent) but it should be as simple as: from scipy.stats import ttest_ind cat1 = my_data[my_data[‘Category’]==’cat1′] cat2 = my_data[my_data[‘Category’]==’cat2′] ttest_ind(cat1[‘values’], cat2[‘values’]) >>> (1.4927289925706944, 0.16970867501294376) it returns a tuple with the t-statistic & the p-value see here for other t-tests … Read more

Boxplots in matplotlib: Markers and outliers

A picture is worth a thousand words. Note that the outliers (the + markers in your plot) are simply points outside of the wide [(Q1-1.5 IQR), (Q3+1.5 IQR)] margin below. However, the picture is only an example for a normally distributed data set. It is important to understand that matplotlib does not estimate … Read more

Is Python faster and lighter than C++? [closed]

I think you’re reading those stats incorrectly. They show that Python is up to about 400 times slower than C++ and with the exception of a single case, Python is more of a memory hog. When it comes to source size though, Python wins flat out. My experiences with Python show the same definite trend … Read more

Algorithm for sampling without replacement?

Here’s some code for sampling without replacement based on Algorithm 3.4.2S of Knuth’s book Seminumeric Algorithms. void SampleWithoutReplacement ( int populationSize, // size of set sampling from int sampleSize, // size of each sample vector<int> & samples // output, zero-offset indicies to selected items ) { // Use Knuth’s variable names int& n = sampleSize; … Read more