Fixed effect in Pandas or Statsmodels

As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options: If you use Python 3 you can use linearmodels as specified in the more recent answer: https://stackoverflow.com/a/44836199/3435183 Just specify various dummies in your statsmodels specification, e.g. using pd.get_dummies. May not be feasible if the … Read more

Run an OLS regression with Pandas Data Frame

I think you can almost do exactly what you thought would be ideal, using the statsmodels package which was one of pandas‘ optional dependencies before pandas‘ version 0.20.0 (it was used for a few things in pandas.stats.) >>> import pandas as pd >>> import statsmodels.formula.api as sm >>> df = pd.DataFrame({“A”: [10,20,30,40,50], “B”: [20, 30, … Read more

Time Series Analysis – unevenly spaced measures – pandas + statsmodels

seasonal_decompose() requires a freq that is either provided as part of the DateTimeIndex meta information, can be inferred by pandas.Index.inferred_freq or else by the user as an integer that gives the number of periods per cycle. e.g., 12 for monthly (from docstring for seasonal_mean): def seasonal_decompose(x, model=”additive”, filt=None, freq=None): “”” Parameters ———- x : array-like … Read more

Pythonic way of detecting outliers in one dimensional observation data

The problem with using percentile is that the points identified as outliers is a function of your sample size. There are a huge number of ways to test for outliers, and you should give some thought to how you classify them. Ideally, you should use a-priori information (e.g. “anything above/below this value is unrealistic because…”) … Read more

confidence and prediction intervals with StatsModels

For test data you can try to use the following. predictions = result.get_prediction(out_of_sample_df) predictions.summary_frame(alpha=0.05) I found the summary_frame() method buried here and you can find the get_prediction() method here. You can change the significance level of the confidence interval and prediction interval by modifying the “alpha” parameter. I am posting this here because this was … Read more

Weighted standard deviation in NumPy

How about the following short “manual calculation”? def weighted_avg_and_std(values, weights): “”” Return the weighted average and standard deviation. values, weights — Numpy ndarrays with the same shape. “”” average = numpy.average(values, weights=weights) # Fast and numerically precise: variance = numpy.average((values-average)**2, weights=weights) return (average, math.sqrt(variance))

scikit-learn & statsmodels – which R-squared is correct?

Arguably, the real challenge in such cases is to be sure that you compare apples to apples. And in your case, it seems that you don’t. Our best friend is always the relevant documentation, combined with simple experiments. So… Although scikit-learn’s LinearRegression() (i.e. your 1st R-squared) is fitted by default with fit_intercept=True (docs), this is … Read more