statistics - w3toppers.com

Select random row from a PostgreSQL table with weighted row probabilities

This should do the trick: WITH CTE AS ( SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R ) SELECT * FROM ( SELECT id, SUM(percent) OVER (ORDER BY id) S, R FROM YOUR_TABLE CROSS JOIN CTE ) Q WHERE S >= R ORDER BY id LIMIT 1; The sub-query Q gives the following result: 1 … Read more

Calculate mean and standard deviation from a vector of samples in C++ using Boost

I don’t know if Boost has more specific functions, but you can do it with the standard library. Given std::vector<double> v, this is the naive way: #include <numeric> double sum = std::accumulate(v.begin(), v.end(), 0.0); double mean = sum / v.size(); double sq_sum = std::inner_product(v.begin(), v.end(), v.begin(), 0.0); double stdev = std::sqrt(sq_sum / v.size() – mean … Read more

Computing cross-correlation function?

Product() aggregate function

The logarathm/power approach is the generally used approach. For Oracle, that is: select exp(sum(ln(col))) from table; I don’t know why the original database designers didn’t include PRODUCT() as an aggregation function. My best guess is that they were all computer scientists, with no statisticians. Such functions are very useful in statistics, but they don’t show … Read more

What does the period mean when used with ~ (in a formula)?

R optimization with equality and inequality constraints

How to use the ‘sweep’ function

Compute a confidence interval from sample data assuming unknown distribution

If you don’t know the underlying distribution, then my first thought would be to use bootstrapping: https://en.wikipedia.org/wiki/Bootstrapping_(statistics) In pseudo-code, assuming x is a numpy array containing your data: import numpy as np N = 10000 mean_estimates = [] for _ in range(N): re_sample_idx = np.random.randint(0, len(x), x.shape) mean_estimates.append(np.mean(x[re_sample_idx])) mean_estimates is now a list of 10000 … Read more