BookmarkSubscribeRSS Feed
stratozyck
Calcite | Level 5

Hi, we are trying to do variable selection and we keep getting "Note: The following variables are not used in the SCORE selection since they are a linear combination of other variables as shown."

Is there a way to make all variables used? We need to have all variables used in the selection option of proc logistic even if there are linear combinations.

9 REPLIES 9
Reeza
Super User

Please post your full code and log.

PaigeMiller
Diamond | Level 26

@stratozyck wrote:

Hi, we are trying to do variable selection and we keep getting "Note: The following variables are not used in the SCORE selection since they are a linear combination of other variables as shown."

Is there a way to make all variables used? We need to have all variables used in the selection option of proc logistic even if there are linear combinations.


No, that's the problem when some variables are linear combinations of other variables. They cannot all be used by PROC LOGISTIC. And it doesn't make sense to use all of them anyway.

 

 

 

 

--
Paige Miller
stratozyck
Calcite | Level 5

The code is:

ODS OUTPUT Bestsubsets = gridwork.qry_subsets&seg;
proc logistic data=&indata;
model outcome(EVENT='1')= &list
/selection=score best= &modnum start=&minnum stop=&varnum;
run;

 

What we want it to do is consider all variables in sets of 3-4 or so. Lets say we have x1-x100. We want it to consider all of them in sets of 3-4. 

I get what you are saying about the variables not being used in the same model, but if x2 is a linear combination of x5 and x6, we still want x2 considered even if x5 and x6 are not in the model.

Reeza
Super User

You run into problems manipulating the matrices when it's an exact linear combination, it's not a matter of 'allowing' something to occur.

stratozyck
Calcite | Level 5
No there would be no problem. Lets say X1 is 2*X2

Obviouslt including both X1 and X2 is an issue but the selection procedure only considers one and doesnt consider the other in any way.

Its more complex than that. It dops a variable from consideration because its a linear combo of like 10 other variables even though we told it to only consider 3 per candidate model.

Again, we are NOT trying to put variables that are linear combinations into one model, but want it to consider variables it says are linear combinations in seperate candiate models.
Reeza
Super User

@stratozyck wrote:
No there would be no problem. Lets say X1 is 2*X2

Obviouslt including both X1 and X2 is an issue but the selection procedure only considers one and doesnt consider the other in any way.

Its more complex than that. It dops a variable from consideration because its a linear combo of like 10 other variables even though we told it to only consider 3 per candidate model.

Again, we are NOT trying to put variables that are linear combinations into one model, but want it to consider variables it says are linear combinations in seperate candiate models.

It'll run fine within an individual process, but I'm not seeing how you're controlling for 3 per candidate model. I'm assuming that's some sort of macro but the code shown is just the logistic regression. You could a KEEP statement to each iteration to ensure it's just the variables you want considered.

PaigeMiller
Diamond | Level 26

@stratozyck wrote:
No there would be no problem. Lets say X1 is 2*X2

Obviouslt including both X1 and X2 is an issue but the selection procedure only considers one and doesnt consider the other in any way.

Its more complex than that. It dops a variable from consideration because its a linear combo of like 10 other variables even though we told it to only consider 3 per candidate model.

Again, we are NOT trying to put variables that are linear combinations into one model, but want it to consider variables it says are linear combinations in seperate candiate models.

The restriction isn't because of SAS, it comes from the fact that the math doesn't allow it. Any matrix that has linear combinations of the X-variables cannot be inverted and so cannot be used in logistic (or ordinary least squares) regression.

 

Also

 

What we want it to do is consider all variables in sets of 3-4 or so. Lets say we have x1-x100. We want it to consider all of them in sets of 3-4. 

I have no idea whether this is a good idea or not, but in my mind, I wouldn't do this when you have 100 x-variables. The collinearity between this 100 x-variables is going to cause problems with whatever selection method is used.

 

Perhaps, given the explanation above, that you have 100 correlated input variables, some of them a linear combination of other variables, you need a change of mindset.

 

You need a method that is better at handling correlated input variables than either logistic regression or ordinary least squares regression. You also need a method that can handle situations where the input matrix has linear combinations of your x-variables. That method is Partial Least Squares regression, but it requires a change of mindset, where now you KEEP all of your x-variables in the model, many of them will have weightings (loadings is the technical term) that are very close to zero. Partial Least Squares is much better than ordinary least squares regression or logistic regression at modelling with lots of input variables; and it also can work when you have linear combinations of input variables.

--
Paige Miller
Ksharp
Super User

Check  include= option of model statement.

PaigeMiller
Diamond | Level 26

Just because you can use the INCLUDE option doesn't mean you should use the INCLUDE option.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2567 views
  • 0 likes
  • 4 in conversation