As noted in the comments, PanelOLS has been removed from Pandas as of version 0.20.0. So you really have three options:
-
If you use Python 3 you can use
linearmodels
as specified in the more recent answer: https://stackoverflow.com/a/44836199/3435183 -
Just specify various dummies in your
statsmodels
specification, e.g. usingpd.get_dummies
. May not be feasible if the number of fixed effects is large. -
Or do some groupby based demeaning and then use
statsmodels
(this would work if you’re estimating lots of fixed effects). Here is a barebones version of what you could do for one way fixed effects:import statsmodels.api as sm import statsmodels.formula.api as smf import patsy def areg(formula,data=None,absorb=None,cluster=None): y,X = patsy.dmatrices(formula,data,return_type="dataframe") ybar = y.mean() y = y - y.groupby(data[absorb]).transform('mean') + ybar Xbar = X.mean() X = X - X.groupby(data[absorb]).transform('mean') + Xbar reg = sm.OLS(y,X) # Account for df loss from FE transform reg.df_resid -= (data[absorb].nunique() - 1) return reg.fit(cov_type="cluster",cov_kwds={'groups':data[cluster].values})
For example, suppose you have a panel of stock data: stock returns and other stock data for all stocks, every month over a number of months and you want to regress returns on lagged returns with calendar month fixed effects (where the calender month variable is called caldt
) and you also want to cluster the standard errors by calendar month. You can estimate such a fixed effect model with the following:
reg0 = areg('ret~retlag',data=df,absorb='caldt',cluster="caldt")
And here is what you can do if using an older version of Pandas
:
An example with time fixed effects using pandas’ PanelOLS
(which is in the plm module). Notice, the import of PanelOLS
:
>>> from pandas.stats.plm import PanelOLS
>>> df
y x
date id
2012-01-01 1 0.1 0.2
2 0.3 0.5
3 0.4 0.8
4 0.0 0.2
2012-02-01 1 0.2 0.7
2 0.4 0.5
3 0.2 0.3
4 0.1 0.1
2012-03-01 1 0.6 0.9
2 0.7 0.5
3 0.9 0.6
4 0.4 0.5
Note, the dataframe must have a multindex set ; panelOLS
determines the time
and entity
effects based on the index:
>>> reg = PanelOLS(y=df['y'],x=df[['x']],time_effects=True)
>>> reg
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x>
Number of Observations: 12
Number of Degrees of Freedom: 4
R-squared: 0.2729
Adj R-squared: 0.0002
Rmse: 0.1588
F-stat (1, 8): 1.0007, p-value: 0.3464
Degrees of Freedom: model 3, resid 8
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 0.3694 0.2132 1.73 0.1214 -0.0485 0.7872
---------------------------------End of Summary---------------------------------
Docstring:
PanelOLS(self, y, x, weights = None, intercept = True, nw_lags = None,
entity_effects = False, time_effects = False, x_effects = None,
cluster = None, dropped_dummies = None, verbose = False,
nw_overlap = False)
Implements panel OLS.
See ols function docs
This is another function (like fama_macbeth
) where I believe the plan is to move this functionality to statsmodels
.