Re: Regression add var and other var become mot significant

Ronein

Hello
I run logistic regression (binary outcome) with categorical predictors ( credit risk model).
There is a predictor varaible x1 that is significant but when I add another variable x2 then x1 become not significant and very weak. I run Kramer matrix and don't see high correlation between x1 and x2( correlation 0.2).
What action ate needed ( show code please) to investigate the reason why it happened?
I read that can using Stata's margins command to explore how your key explanatory variable's association with the outcome differs at various levels of other predictors. Can show code please?

ballardw

Absolute minimum for discussion would be both sets of code submitted. Better would be to include the output you are discussing so we can see values and summaries to see if they provide any helpful hints.

Why? Different model, different fit. The math does that.

What it means? Depends a lot on the actual content of the data.

Suppose I have a model (different sort of regression) that has a dependent variable of Maximum Air Temperature and use Month of the year as the predictor. I might get some indications of a "good" fit because winter has lower temps than summer on some given range of dates. Then I add another predictor, such as the previous day's temperature to the model. It is very likely that the "Month" is less useful as a predictor because of the nature of how air temperature works.

Statistically significant does always imply practicality of use or reliability.

Ronein

Thanks.
I would like to see a code that
explore how explanatory variable x2 association with the outcome differs at various levels of x1

Ksharp

"explore how explanatory variable x2 association with the outcome differs at various levels of x1"
You could check ESTIMATE or LSMEANS or LSESTIMATE statement to compare the difference between X levels.

https://support.sas.com/kb/24/447.html

Ksharp

"but when I add another variable x2 then x1 become not significant and very weak. "
That is a very normal phenomenon in Statisctcial Model.
There are four type of Square Sum to split the variance from variables. Type ISS IISS IIISS IVSS.
Once a new variable X2 enter the model ,the SS would be split once again. X2 assign more SS ,X1 assign less SS,therefore you would see X2 is significant, X1 is not.

" don't see high correlation between x1 and x2( correlation 0.2)."
Also could use the following option(CORRB) to check their multiple-colinear.
proc logistic .....;
class x1 x2;
model y=x1 x2/CORRB;
run;

And I think @Rick_SAS @StatDave_SAS have more words to say.

Ronein

Thanks
Is this corrb options not same as Kramer matrix?
In Kramer matrix the correlation between x1 and x2 is low (0.2) so from here I couldn't identify multicollinrarity.

Here is example to corrb option that you showed .
proc logistic data=sashelp.heart;
class sex bp_status;
model status=sex bp_status height weight ageatstart/corrb ;
run;

Ksharp

Sorry. I am not familiar with Kramer matrix .

But CORR option could take into account of the correlation between all the independent variable, so I suggest to use it . According to the code you posted ,you get this:

Height and Intercept term are multicollinrarity.

There are too many method to performance the selection of variables.

The following are the two ways I am favorited .

1)LASSO method:

proc hpgenselect data=sashelp.heart;
class sex bp_status;
model status=sex bp_status height weight ageatstart/dist=binary ;
selection method=Lasso(choose=SBC) details=all;
performance details;
run;

2) PROC PLS (which could take care of MISSGING value ,while PROC HPGENSELECT would delete it)

P.S. This is specially suited for Credit Risk/Score Card data,since the big wide table have lots of missing values.



data heart;
 set sashelp.heart;
 y=ifn(status='Dead',1,0);
run;

ods output  VariableImportancePlot= VariableImportancePlot;
proc pls data=heart  missing=em   nfac=2 plot=(ParmProfiles VIP) details; 
class sex bp_status;
 model y=sex bp_status height weight ageatstart;
run;
proc sort data=VariableImportancePlot;
 by descending VIP;
run;
proc print;run;

the VIP is bigger,the variable is more important.

ballardw

For what very little it may be worth, in my previous job we has some models that our process before reporting results to the clients checked the result with the same model variables in different order on the equivalent of the Model statement. Exact same variables, different "significance" results. Which basically was telling us the choice of variables wasn't robust if variables when from significant to not significant...

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away