DATA Step, Macro, Functions and more

changing the list of independent variables in stepwise regression

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

changing the list of independent variables in stepwise regression

I am trying to run two stepwise regressions sequentially. The first includes a set of indep variables X1, X2, and X3. Once I know which of those are meaningful, I want to estimate a second regression that includes the ones that are meaningful plus a number of other independent variables (X4, X5, X6).

For example, the first stage regression is:

proc reg outest=model_coef; model Y=X1 X2 X3 / selection=stepwise; by firm;

For firm A, only X1 is meaningful

For firm B, X1 and X3 are meaningful

Thus, I'd like the second regression to read:

For firm A: proc reg; model Y=X1 X4 X5 X6 / selection=stepwise include=1;

For firm B: proc reg; model Y=X1 X3 X4 X5 X6 / selection=stepwise include=2;

Note that the regression for B has two changes from the regression for A - the list of independent variables changes, and the "include" number changes from 1 to 2.

I can, of course, see which variables enter the model (from outest=model_coef), but I can seem to figure out how to move from that information to the second stage regression.

Any suggestions? Thanks!


Accepted Solutions
Solution
‎07-04-2014 03:15 PM
Trusted Advisor
Posts: 1,228

Re: changing the list of independent variables in stepwise regression

Problem is with your macro variables. I've made some changes in data step (highlighted) run this to have your macro variables populated then proceed to proc reg.

data test2; set outtest1;

if X1 ne . then X1_in_model='X1 ';

if X2 ne . then X2_in_model='X2 ';

if X3 ne . then X3_in_model='X3 ';

vars=cat(X1_in_model,X2_in_model,X3_in_model);

numvars=n(X1,X2,X3);

call symputx('macrovar',vars);

call symputx('macronum',numvars);

keep firm_id &macrovar &macronum;

proc print; run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,228

Re: changing the list of independent variables in stepwise regression

How you determine variables X1,X3 are meaningful based on first regression?

Contributor
Posts: 23

Re: changing the list of independent variables in stepwise regression

The default " /selection=stepwise " requires a variable to statistically significant at the 0.15 level for entry into the model.

Trusted Advisor
Posts: 1,228

Re: changing the list of independent variables in stepwise regression

Right, this is the internal processing of proc reg to include an independent variable in the model. My question was relating to X1 and X3, which are being considering to include in the second step regression. How will you decide these variables should go to the second step regression? This will give some baseline to filter these variable from the first step regression which is your requirement right?

Contributor
Posts: 23

Re: changing the list of independent variables in stepwise regression

Perhaps I am misunderstanding your question. The results from the first step might look like this:

Obs   _MODEL_   _TYPE_   _DEPVAR_    _RMSE_    Intercept      X1     x2      X3       y   _IN_   _P_   _EDF_    _RSQ_

1    MODEL1    PARMS       y       0.094107    0.010854   0.83273    .   -0.44081   -1     2     3      33    0.47322

In this case, SAS tells me that variables X1 and X3 are included in the first step regression and I would now like to include them in the second step regression. I was thinking I might be able to create macro variables based on that output (a list of the meaningful variables and the number of meaningful variables) and then feed that list back into the second step regression. I can create those variables, but I can't figure out how to feed it back in. I'm wide open to any other ideas.

Trusted Advisor
Posts: 1,228

Re: changing the list of independent variables in stepwise regression

Now this is more clear. Idea is right just make a macro variable to assign variables X1 and X3 and put them back in the second step regression like

proc reg; model Y=&vars X4 X5 X6 / selection=stepwise include=2;

Contributor
Posts: 23

Re: changing the list of independent variables in stepwise regression

That's the part I'm having an issue with. For example:

data test1; set test0;

proc reg outest=outtest1; model y=x1 x2 x3 / Selection=Stepwise rsquare;  by firm_id;

run;

data test2; set outtest1;

if X1 ne . then X1_in_model='X1 ';

if X2 ne . then X2_in_model='X2 ';

if X3 ne . then X3_in_model='X3 ';

vars=cat(X1_in_model,X2_in_model,X3_in_model);

numvars=n(X1,X2,X3);

%let macrovar=vars;

%let macronum=numvars;

keep firm_id &macrovar &macronum;

proc print; run;

/*************

the output here looks this:

Obs    firm_id    vars        numvars

1      70740     X1    X3       2

2      75160     X1             1

3      76695     X1 X2          2

************/

*now how do i get those variables back into the next regression? This does not work (since it sees "vars" as data rather than variable names). Any suggestions? ;

data test4; merge test1 test2; by firm_id;

proc reg outest=outest4; model y=&macrovar X4 X5 X6 / selection=stepwise include=&macronum;

run;

Trusted Advisor
Posts: 1,228

Re: changing the list of independent variables in stepwise regression

Try this way.

data test4; merge test1 test2; by firm_id;

proc reg data=test4 outest=outest4; model y=&macrovar X4 X5 X6 / selection=stepwise include=&macronum;

run;

Contributor
Posts: 23

Re: changing the list of independent variables in stepwise regression

Isn't that the same as I had?

Results in same error:

593  proc reg data=test4 outest=outest4; model y=&macrovar X4 X5 X6 / selection=stepwise include=&macronum;

ERROR: Variable vars in list does not match type prescribed for this list.

NOTE: Line generated by the macro variable "MACRONUM".

1    numvars

     -------

     22

     202

ERROR 22-322: Expecting an integer constant.

ERROR 202-322: The option or parameter is not recognized and will be ignored.

594  run;

NOTE: The previous statement has been deleted.

WARNING: The variable _NAME_ or _TYPE_ exists in a data set that is not TYPE=CORR, COV, SSCP, etc.

WARNING: No variables specified for an SSCP matrix. Execution terminating.

NOTE: PROCEDURE REG used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

NOTE: The data set WORK.OUTEST4 has 0 observations and 4 variables.

Solution
‎07-04-2014 03:15 PM
Trusted Advisor
Posts: 1,228

Re: changing the list of independent variables in stepwise regression

Problem is with your macro variables. I've made some changes in data step (highlighted) run this to have your macro variables populated then proceed to proc reg.

data test2; set outtest1;

if X1 ne . then X1_in_model='X1 ';

if X2 ne . then X2_in_model='X2 ';

if X3 ne . then X3_in_model='X3 ';

vars=cat(X1_in_model,X2_in_model,X3_in_model);

numvars=n(X1,X2,X3);

call symputx('macrovar',vars);

call symputx('macronum',numvars);

keep firm_id &macrovar &macronum;

proc print; run;

Contributor
Posts: 23

Re: changing the list of independent variables in stepwise regression

Thanks. Very helpful. Reading about the call symputx - although I'ms still no sure I fully understand what's going on.

what's the difference between

%let macrovar=vars;

and

call symputx('macrovar',vars);

Reading about call symputx it says it creates a macro variable called macrovar from vars. How is that different from the %let?

Thanks again for your help on this!

Trusted Advisor
Posts: 1,228

Re: changing the list of independent variables in stepwise regression

symputx assigns value produced in a DATA step to a macro-variable. %LET is used in open code, not inside a datastep or proc.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 1120 views
  • 0 likes
  • 2 in conversation