statistics - w3toppers.com

Error in contrasts when defining a linear model in R

If your independent variable (RHS variable) is a factor or a character taking only one value then that type of error occurs. Example: iris data in R (model1 <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris)) # Call: # lm(formula = Sepal.Length ~ Sepal.Width + Species, data = iris) # Coefficients: # (Intercept) Sepal.Width Speciesversicolor Speciesvirginica … Read more

Why am I getting “algorithm did not converge” and “fitted prob numerically 0 or 1” warnings with glm?

If you look at ?glm (or even do a Google search for your second warning message) you may stumble across this from the documentation: For the background to warning messages about ‘fitted probabilities numerically 0 or 1 occurred’ for binomial GLMs, see Venables & Ripley (2002, pp. 197–8). Now, not everyone has that book. But … Read more

R tick data : merging date and time into a single object

Create a datetime object with as.POSIXct: as.POSIXct(paste(x$date, x$time), format=”%Y-%m-%d %H:%M:%S”) [1] “2010-02-02 08:00:03 GMT” “2010-02-02 08:00:04 GMT” “2010-02-02 08:00:04 GMT” [4] “2010-02-02 08:00:04 GMT” “2010-02-02 08:00:04 GMT”

Calculating Pearson correlation and significance in Python

You can have a look at scipy.stats: from pydoc import help from scipy.stats.stats import pearsonr help(pearsonr) >>> Help on function pearsonr in module scipy.stats.stats: pearsonr(x, y) Calculates a Pearson correlation coefficient and the p-value for testing non-correlation. The Pearson correlation coefficient measures the linear relationship between two datasets. Strictly speaking, Pearson’s correlation requires that each … Read more

How do I determine the standard deviation (stddev) of a set of values?

While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance… Plus, don’t never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster. By far the best way … Read more

Cosmic Rays: what is the probability they will affect a program?

From Wikipedia: Studies by IBM in the 1990s suggest that computers typically experience about one cosmic-ray-induced error per 256 megabytes of RAM per month.[15] This means a probability of 3.7 × 10-9 per byte per month, or 1.4 × 10-15 per byte per second. If your program runs for 1 minute and occupies 20 MB … Read more

Compute a confidence interval from sample data

import numpy as np import scipy.stats def mean_confidence_interval(data, confidence=0.95): a = 1.0 * np.array(data) n = len(a) m, se = np.mean(a), scipy.stats.sem(a) h = se * scipy.stats.t.ppf((1 + confidence) / 2., n-1) return m, m-h, m+h You can calculate like this.

Find p-value (significance) in scikit-learn LinearRegression

This is kind of overkill but let’s give it a go. First lets use statsmodel to find out what the p-values should be import pandas as pd import numpy as np from sklearn import datasets, linear_model from sklearn.linear_model import LinearRegression import statsmodels.api as sm from scipy import stats diabetes = datasets.load_diabetes() X = diabetes.data y … Read more

How to make execution pause, sleep, wait for X seconds in R?

See help(Sys.sleep). For example, from ?Sys.sleep testit <- function(x) { p1 <- proc.time() Sys.sleep(x) proc.time() – p1 # The cpu usage should be negligible } testit(3.7) Yielding > testit(3.7) user system elapsed 0.000 0.000 3.704

How to normalize a NumPy array to a unit vector?

If you’re using scikit-learn you can use sklearn.preprocessing.normalize: import numpy as np from sklearn.preprocessing import normalize x = np.random.rand(1000)*10 norm1 = x / np.linalg.norm(x) norm2 = normalize(x[:,np.newaxis], axis=0).ravel() print np.all(norm1 == norm2) # True