I am writing a macro to run regressions with fixed effects using demeaning approach because the normal approach sometimes costs too much memory. The fixed effects are stock and time fixed effects. For example, with time (year) fixed effect, My steps are:
1. Get the means of all variables by time (i.e. each year,)
2. subtract the means from variables. This gives a demeaned data
3. Run regression using the demeaned data.
Now, how about the double fixed effects? Do I get the means by time and by stock, and then subtract these means from the variables? I read this post which says I need to add back the total sample means of the variables. So What should I do?
P/S: I use PROC SURVEYREG to run my regressions as it helps with double clusters. Please recommend if you know other procedures that are more efficient. Thanks
In statistics, what you call the "demeaned" data is called "centered" data. The analysis is a regression on the centered variables. (You can also center and scale by the standard deviation, which results in a standardized regression.)
In SAS you can use PROC STDIZE to center the variables. You can use a BY statement in your favorite regression procedure to perform multiple regressions with one call.
I don't fully understand the details of your question, but here is some data and code that you can study. It might provide insight into the analysis. It will definitely provide us with a common set of data that we can all use to help refine your question and our suggestions:
/* sample data */
data Stocks;
set Sashelp.Stocks;
year = year(date);
if year >= 2000; /* retrict size of output */
run;
proc freq data=Stocks; tables year; run;
/* sort by Year and Stock */
proc sort data=Stocks;
by Year Stock;
run;
/* center data */
proc stdize data=Stocks out=StocksCentered
method=mean OPREFIX SPREFIX=Center;
by Year Stock;
/* no VAR stmt, so all remaining numerical vars are standardized */
run;
/* use your favorite regression procedure here.
Perform regressions for each year and stock(?) */
proc autoreg data=StocksCentered noprint;
by Year Stock;
model CenterClose = CenterVolume;
output out=RegOut pred=Pred;
run; quit;
/* visualize results */
proc sgpanel data=RegOut;
panelby Year Stock / layout=lattice;
scatter y=CenterClose x=CenterVolume;
series y=Pred x=CenterVolume;
run;
Thanks Rick for the response. to clear up my question, I would like to run some regressions with fixed effects. The fixed effects here are stock and time. Using the normal PROCs returns memory errors so I have to use centered data. So for an example with time fixed effect, I first sort my data by time, and at each time (month) calculate the means for all variables in the regression. I then subtract these means from the variables from each observation (i.e. X = X - X_mean and Y=Y- Y_mean). And then regress Y on X as normal without specifying the CLASS option. Here is my code
proc sort data=panel; by month; run;
proc means data=panel noprint; by month; output out=means (drop=_TYPE_ _FREQ_)
mean(Y)=Y_mean mean(X)=X_mean;
run;
data means; merge panel means; by month; Y=Y-Y_mean;X=X-X_mean;run;
ods output ParameterEstimates=est fitstatistics=fit;
proc surveyreg data=means; cluster stock;model Y= X / solution; run; quit;
Now, my question is what if I want 2 fixed effects at the same time. Do I subtract the means of variables (by time and by stock) from each observation or do I also need to add back the mean of the overall sample (of all observations, not BY time or stock, please see the original question as I added a picture). So this question is more about statistics. But does SAS has any PROC to deal with large panel data and fixed effect. Normally I use the following code to estimate fixed effects:
proc surveyreg data=panel;class month; cluster month stock;model Y= X month/ solution; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.