statistics
Mysql, reshape data from long / tall to wide
Cross-tabs or pivot tables is the answer. From there you can SELECT FROM … INSERT INTO … or create a VIEW from the single SELECT. Something like: SELECT country, MAX( IF( key=’President’, value, NULL ) ) AS President, MAX( IF( key=’Currency’, value, NULL ) ) AS Currency, … FROM table GROUP BY country;
How to plot ROC curve in Python
Here are two ways you may try, assuming your model is an sklearn predictor: import sklearn.metrics as metrics # calculate the fpr and tpr for all thresholds of the classification probs = model.predict_proba(X_test) preds = probs[:,1] fpr, tpr, threshold = metrics.roc_curve(y_test, preds) roc_auc = metrics.auc(fpr, tpr) # method I: plt import matplotlib.pyplot as plt plt.title(‘Receiver … Read more
How to calculate cumulative normal distribution?
Here’s an example: >>> from scipy.stats import norm >>> norm.cdf(1.96) 0.9750021048517795 >>> norm.cdf(-1.96) 0.024997895148220435 In other words, approximately 95% of the standard normal interval lies within two standard deviations, centered on a standard mean of zero. If you need the inverse CDF: >>> norm.ppf(norm.cdf(1.96)) array(1.9599999999999991)
Rolling median algorithm in C
I have looked at R’s src/library/stats/src/Trunmed.c a few times as I wanted something similar too in a standalone C++ class / C subroutine. Note that this are actually two implementations in one, see src/library/stats/man/runmed.Rd (the source of the help file) which says \details{ Apart from the end values, the result \code{y = runmed(x, k)} simply … Read more
Add error bars to show standard deviation on a plot in R
A solution with ggplot2 : qplot(x,y)+geom_errorbar(aes(x=x, ymin=y-sd, ymax=y+sd), width=0.25)
How do I calculate r-squared using Python and Numpy?
A very late reply, but just in case someone needs a ready function for this: scipy.stats.linregress i.e. slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y) as in @Adam Marples’s answer.
How do I calculate percentiles with python/numpy?
You might be interested in the SciPy Stats package. It has the percentile function you’re after and many other statistical goodies. percentile() is available in numpy too. import numpy as np a = np.array([1,2,3,4,5]) p = np.percentile(a, 50) # return 50th percentile, e.g median. print p 3.0 This ticket leads me to believe they won’t … Read more
How to calculate the statistics “t-test” with numpy
In a scipy.stats package there are few ttest_… functions. See example from here: >>> print ‘t-statistic = %6.3f pvalue = %6.4f’ % stats.ttest_1samp(x, m) t-statistic = 0.391 pvalue = 0.6955