04-23-2018 11:29 AM
I am writing a macro to run fixed effect regressions with clustering using the demeaning method as normal procedures give memory errors. With my current code, I have to modify the macro everytime I run a different regression as the variables are different and I have to get the means of them. I would like to write a macro which can apply to any variables I input without changing the macro. My current macro is:
%macro FEregression(dep,indep,clusterVar,FE_var); * To run with fixed effects use the method of subtracting off the mean for each date because the standard dummy variables approach needs too much memory; proc sort data=panel; by &FE_var; run; proc means data=panel print; by &FE_var; output out=means (drop=_TYPE_ _FREQ_) mean(&dep)=m&dep mean(A)=mA mean(B)=mB mean(C)=mC mean(D)=mD ; run; data means; merge panel means; by &FE_var; &dep=&dep-m&dep; A=A-mA; B=B-mB;C=C-mC;D=D-mD; run; proc surveyreg data=means; class &FE_var; cluster &clusterVar; * Cluster by clusterVar; model &dep = &indep / solution; run; quit; %mend; %FEregression(Y, A B C D, , date)
So for this example, I am regressing Y on A B C D.
04-23-2018 01:09 PM - edited 04-23-2018 01:17 PM
So what you are asking is to replace in this code:
proc means data=panel print; by &FE_var; output out=means (drop=_TYPE_ _FREQ_) mean(&dep)=m&dep mean(A)=mA mean(B)=mB mean(C)=mC mean(D)=mD ; run; data means;
merge panel means;
by &FE_var; &dep=&dep-m&dep; A=A-mA;
The A B C D variables as needed from the value of &indep where that is a list of variables?
Or are A B C D a subset of the variables in &indep? If so, how do we know what the subset would be?
Things might get a lot simpler if you had a VAR statement on your Proc means like
Var &dep &indep;
and used the autoname option on out put instead of forcing use of the mA mB variables. mean(&dep &indep)= /autoname would append _mean to the name of each variable.
the data step could then become
data means; merge panel means; by &FE_var; &dep=&dep- &dep._mean; %do i= 1 %to %sysfunc(countw(&indep)); %let tvar= %scan(&indep,&i); &tvar = &tvar - &tvar._mean; %end; run;
04-23-2018 01:13 PM - edited 04-23-2018 01:38 PM
yes. that is correct, so that I dont have to change those steps everytime I run a different regression. So A B C D would be all the independent variables in the list &indep. Essentially, what I would like to do is to simply regress a new regression, say Z = M N O P Q by running %FEregression(Z, M N O P Q, clusterVar,date). To do this using my current method, I would have to rewrite the PROC MEANS and DATA step in my macro