I am writing a macro to run fixed effect regressions with clustering using the demeaning method as normal procedures give memory errors. With my current code, I have to modify the macro everytime I run a different regression as the variables are different and I have to get the means of them. I would like to write a macro which can apply to any variables I input without changing the macro. My current macro is:
%macro FEregression(dep,indep,clusterVar,FE_var);
* To run with fixed effects use the method of subtracting off the mean for each date because the standard dummy variables approach needs too much memory;
proc sort data=panel; by &FE_var; run;
proc means data=panel print; by &FE_var; output out=means (drop=_TYPE_ _FREQ_)
mean(&dep)=m&dep mean(A)=mA mean(B)=mB mean(C)=mC mean(D)=mD ;
run;
data means; merge panel means; by &FE_var;
&dep=&dep-m&dep;
A=A-mA; B=B-mB;C=C-mC;D=D-mD;
run;
proc surveyreg data=means; class &FE_var; cluster &clusterVar; * Cluster by clusterVar;
model &dep = &indep / solution;
run; quit;
%mend;
%FEregression(Y, A B C D, , date)
So for this example, I am regressing Y on A B C D.
So what you are asking is to replace in this code:
proc means data=panel print; by &FE_var; output out=means (drop=_TYPE_ _FREQ_) mean(&dep)=m&dep mean(A)=mA mean(B)=mB mean(C)=mC mean(D)=mD ; run; data means;
merge panel means;
by &FE_var; &dep=&dep-m&dep; A=A-mA;
B=B-mB;
C=C-mC;
D=D-mD; run;
The A B C D variables as needed from the value of &indep where that is a list of variables?
Or are A B C D a subset of the variables in &indep? If so, how do we know what the subset would be?
Things might get a lot simpler if you had a VAR statement on your Proc means like
Var &dep &indep;
and used the autoname option on out put instead of forcing use of the mA mB variables. mean(&dep &indep)= /autoname would append _mean to the name of each variable.
the data step could then become
data means; merge panel means; by &FE_var; &dep=&dep- &dep._mean; %do i= 1 %to %sysfunc(countw(&indep)); %let tvar= %scan(&indep,&i); &tvar = &tvar - &tvar._mean; %end; run;
So what you are asking is to replace in this code:
proc means data=panel print; by &FE_var; output out=means (drop=_TYPE_ _FREQ_) mean(&dep)=m&dep mean(A)=mA mean(B)=mB mean(C)=mC mean(D)=mD ; run; data means;
merge panel means;
by &FE_var; &dep=&dep-m&dep; A=A-mA;
B=B-mB;
C=C-mC;
D=D-mD; run;
The A B C D variables as needed from the value of &indep where that is a list of variables?
Or are A B C D a subset of the variables in &indep? If so, how do we know what the subset would be?
Things might get a lot simpler if you had a VAR statement on your Proc means like
Var &dep &indep;
and used the autoname option on out put instead of forcing use of the mA mB variables. mean(&dep &indep)= /autoname would append _mean to the name of each variable.
the data step could then become
data means; merge panel means; by &FE_var; &dep=&dep- &dep._mean; %do i= 1 %to %sysfunc(countw(&indep)); %let tvar= %scan(&indep,&i); &tvar = &tvar - &tvar._mean; %end; run;
yes. that is correct, so that I dont have to change those steps everytime I run a different regression. So A B C D would be all the independent variables in the list &indep. Essentially, what I would like to do is to simply regress a new regression, say Z = M N O P Q by running %FEregression(Z, M N O P Q, clusterVar,date). To do this using my current method, I would have to rewrite the PROC MEANS and DATA step in my macro
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.