I am trying to run two stepwise regressions sequentially. The first includes a set of indep variables X1, X2, and X3. Once I know which of those are meaningful, I want to estimate a second regression that includes the ones that are meaningful plus a number of other independent variables (X4, X5, X6).
For example, the first stage regression is:
proc reg outest=model_coef; model Y=X1 X2 X3 / selection=stepwise; by firm;
For firm A, only X1 is meaningful
For firm B, X1 and X3 are meaningful
Thus, I'd like the second regression to read:
For firm A: proc reg; model Y=X1 X4 X5 X6 / selection=stepwise include=1;
For firm B: proc reg; model Y=X1 X3 X4 X5 X6 / selection=stepwise include=2;
Note that the regression for B has two changes from the regression for A - the list of independent variables changes, and the "include" number changes from 1 to 2.
I can, of course, see which variables enter the model (from outest=model_coef), but I can seem to figure out how to move from that information to the second stage regression.
Any suggestions? Thanks!
Problem is with your macro variables. I've made some changes in data step (highlighted) run this to have your macro variables populated then proceed to proc reg.
data test2; set outtest1;
if X1 ne . then X1_in_model='X1 ';
if X2 ne . then X2_in_model='X2 ';
if X3 ne . then X3_in_model='X3 ';
vars=cat(X1_in_model,X2_in_model,X3_in_model);
numvars=n(X1,X2,X3);
call symputx('macrovar',vars);
call symputx('macronum',numvars);
keep firm_id ¯ovar ¯onum;
proc print; run;
How you determine variables X1,X3 are meaningful based on first regression?
The default " /selection=stepwise " requires a variable to statistically significant at the 0.15 level for entry into the model.
Right, this is the internal processing of proc reg to include an independent variable in the model. My question was relating to X1 and X3, which are being considering to include in the second step regression. How will you decide these variables should go to the second step regression? This will give some baseline to filter these variable from the first step regression which is your requirement right?
Perhaps I am misunderstanding your question. The results from the first step might look like this:
Obs _MODEL_ _TYPE_ _DEPVAR_ _RMSE_ Intercept X1 x2 X3 y _IN_ _P_ _EDF_ _RSQ_
1 MODEL1 PARMS y 0.094107 0.010854 0.83273 . -0.44081 -1 2 3 33 0.47322
In this case, SAS tells me that variables X1 and X3 are included in the first step regression and I would now like to include them in the second step regression. I was thinking I might be able to create macro variables based on that output (a list of the meaningful variables and the number of meaningful variables) and then feed that list back into the second step regression. I can create those variables, but I can't figure out how to feed it back in. I'm wide open to any other ideas.
Now this is more clear. Idea is right just make a macro variable to assign variables X1 and X3 and put them back in the second step regression like
proc reg; model Y=&vars X4 X5 X6 / selection=stepwise include=2;
That's the part I'm having an issue with. For example:
data test1; set test0;
proc reg outest=outtest1; model y=x1 x2 x3 / Selection=Stepwise rsquare; by firm_id;
run;
data test2; set outtest1;
if X1 ne . then X1_in_model='X1 ';
if X2 ne . then X2_in_model='X2 ';
if X3 ne . then X3_in_model='X3 ';
vars=cat(X1_in_model,X2_in_model,X3_in_model);
numvars=n(X1,X2,X3);
%let macrovar=vars;
%let macronum=numvars;
keep firm_id ¯ovar ¯onum;
proc print; run;
/*************
the output here looks this:
Obs firm_id vars numvars
1 70740 X1 X3 2
2 75160 X1 1
3 76695 X1 X2 2
************/
*now how do i get those variables back into the next regression? This does not work (since it sees "vars" as data rather than variable names). Any suggestions? ;
data test4; merge test1 test2; by firm_id;
proc reg outest=outest4; model y=¯ovar X4 X5 X6 / selection=stepwise include=¯onum;
run;
Try this way.
data test4; merge test1 test2; by firm_id;
proc reg data=test4 outest=outest4; model y=¯ovar X4 X5 X6 / selection=stepwise include=¯onum;
run;
Isn't that the same as I had?
Results in same error:
593 proc reg data=test4 outest=outest4; model y=¯ovar X4 X5 X6 / selection=stepwise include=¯onum;
ERROR: Variable vars in list does not match type prescribed for this list.
NOTE: Line generated by the macro variable "MACRONUM".
1 numvars
-------
22
202
ERROR 22-322: Expecting an integer constant.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
594 run;
NOTE: The previous statement has been deleted.
WARNING: The variable _NAME_ or _TYPE_ exists in a data set that is not TYPE=CORR, COV, SSCP, etc.
WARNING: No variables specified for an SSCP matrix. Execution terminating.
NOTE: PROCEDURE REG used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: The data set WORK.OUTEST4 has 0 observations and 4 variables.
Problem is with your macro variables. I've made some changes in data step (highlighted) run this to have your macro variables populated then proceed to proc reg.
data test2; set outtest1;
if X1 ne . then X1_in_model='X1 ';
if X2 ne . then X2_in_model='X2 ';
if X3 ne . then X3_in_model='X3 ';
vars=cat(X1_in_model,X2_in_model,X3_in_model);
numvars=n(X1,X2,X3);
call symputx('macrovar',vars);
call symputx('macronum',numvars);
keep firm_id ¯ovar ¯onum;
proc print; run;
Thanks. Very helpful. Reading about the call symputx - although I'ms still no sure I fully understand what's going on.
what's the difference between
%let macrovar=vars;
and
call symputx('macrovar',vars);
Reading about call symputx it says it creates a macro variable called macrovar from vars. How is that different from the %let?
Thanks again for your help on this!
symputx assigns value produced in a DATA step to a macro-variable. %LET is used in open code, not inside a datastep or proc.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.