BookmarkSubscribeRSS Feed
epialy
Calcite | Level 5

Broad context: I'm doing model selection, and can't use the easy automatic statements because I have a bunch of dummy coded variables + want to use likelihood ratio test. However, because I will be doing a bunch of model selection, I do want to try to automate the process as much as possible.

 

So far what I've done is to create a macro to run the model where I can toggle each variable of interest on/off, and my thinking was that I could run a model, then toggle off a variable, store the outputs of both models, and then stack p values and sort to determine which variable to toss next until they are all significant. However, if anyone has any other strategies I am also open, I'm sure there are also other, probably better, ways to accomplish this.

 

My current macro call looks something like this:

%LRModel(var1,var2,var3,var4,var5,dataset,strata,outcome);

 

If var1-var5 are 1 then the model includes that variable, so a full model looks like:

%LRModel(1,1,1,1,1,dataset,strata,outcome);

 

and then nested models might look like:

%LRModel( ,1,1,1,1,dataset,strata,outcome); 
%LRModel(1, ,1,1,1,dataset,strata,outcome);
%LRModel(1,1, ,1,1,dataset,strata,outcome);
%LRModel(1,1,1, ,1,dataset,strata,outcome);
%LRModel(1,1,1,1, ,dataset,strata,outcome);

The blank can also be anything that's not a 1.

 

I'm stuck on how I can iteratively call the full model and then the nested models, automatically without having to write out each macro call, although I feel like there's got to be a way to to do this. Any help would be much appreciated.

3 REPLIES 3
Quentin
Super User

First I would double-check that you can't use GLMSELECT or one of the similar procedures to do the model selection without having to write a macro.

 

That said, if you go the macro route, I would write a macro to run one regression that you could call like:

%LRModel(var=x1 x2 x3, outcome=y, data=mydata, strata=Z)

Main difference vs your design is that I would pass a list of independent variables, rather than a switch for each possible variable.

 

That macro could run a regression, and output a dataset with the likelihood ratio tests.

 

Once that is working, you can write an outer %DRIVER macro.  That macro would also have a VAR parameter where you would pass a list of independent variables.  Then it would have a %do %until loop, and inside the loop would be a call to %LRMODEL. Inside the loop after calling %LRMODEL it would generate the next list of independent variables. 

 

That said, I'm not clear on what models you want to generate.  If you had just 3 predictors, are you saying would would want to generate just 3 models:

%LRModel(var=     x2 x3, outcome=y, data=mydata, strata=Z)
%LRModel(var=x1      x3, outcome=y, data=mydata, strata=Z)
%LRModel(var=x1 x2     , outcome=y, data=mydata, strata=Z)

Or are you doing some sort of stepwise regression where after running these, if you decide the model with X1 and X3 is best, you then want to run a model with just X1 and a model with just X3?

 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
epialy
Calcite | Level 5

Sorry, I totally forgot to specify- I'm doing backwards selection and starting with a full model and dropping one variable at a time. I need to calculate a LR ratio test for the comparison between the full model and the model with 1 variable dropped, for every combination to determine which variable to drop.

 

I'm using proc phreg and time dependent variables, so I have dummy coding within the model. I was using the toggles to control both the list of independent variables, and the segment that writes the dummy coding e.g.

proc phreg data=&dataset multipass outest=m3;
  class &cl  /param=ref order=formatted ref=first;
  model (startage,endage)*&outcome(0) = 
	%if &var1.=1 %then %do; td1 td2 td3 %end;
	/ rl type3 ties=exact;

  %if &var1.=1 %then %do;
  if 0<= followup <365.25 then do;
      td1=0; td2=0; td3=0; end;
	%end;

This is super simplified and just one variable out of many but hopefully it gives the idea. I don't think I'd be able to just pass a list of variables, because "td1 td2 td3" have to be treated as the same variable. This is the reason I can't use the automatic selection methods either.

Quentin
Super User

Well, now I'm more confused. I've used PHREG a bit, but didn't realize it allowed programming statements. : )

 

If you like your macro parameterization, then that's enough for me to happy with it.  I stand by my general suggestion of getting the PHREG macro to work like you want to run one model, and calculate whatever test statistic you want for that model, and then writing a %Driver macro that will generate calls to your PHREG macro until whatever stopping condition you want.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 192 views
  • 0 likes
  • 2 in conversation