BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
marcel
Obsidian | Level 7

Hello all,

 

I understand that I can check collinearity for logistic regreesion by using Porc REG. I hit a snag when trying to do it. My response for a logistci regression is coded as # event / # trials. It turns out that proc REG does not accept this type of format for the response. So I decided to use the proportion as the dependent variable to check for collinearity in PROC REG.

 

The problem is that when I run the logistic regression with the dependent variable coded as #event/#trials i get totally diffrente results (paramater estimates, SE, p-values) from the logistic when compared to the logistic run with the dependent variable coded as the proportion. I want to keep the # event / # trials as the format for my response for the logistic. Can I use the collinearity test obtained from PROC REG by using the proportion as dependent, given that # event / # trials vs. proportion produce different results?

 

Is there any way to circumvent this problem?

 

Regards,

 

Marcel

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@marcel wrote:

Hello all,

 

I understand that I can check collinearity for logistic regreesion by using Porc REG. I hit a snag when trying to do it. My response for a logistci regression is coded as # event / # trials. It turns out that proc REG does not accept this type of format for the response. So I decided to use the proportion as the dependent variable to check for collinearity in PROC REG.

 

The problem is that when I run the logistic regression with the dependent variable coded as #event/#trials i get totally diffrente results (paramater estimates, SE, p-values) from the logistic when compared to the logistic run with the dependent variable coded as the proportion. I want to keep the # event / # trials as the format for my response for the logistic. Can I use the collinearity test obtained from PROC REG by using the proportion as dependent, given that # event / # trials vs. proportion produce different results?

 

Is there any way to circumvent this problem?


I suspect you get different results because of different sample sizes in each cell. You can (and should) use the events / trials form of the PROC LOGISTIC MODEL statement. In PROC REG, make the dependent variable a binary 0 or 1, and then replicate the row as many times as #trials. The Y value is not used in checking for collinearity.

 

Another approach entirely is to perform Partial Least Squares Regression with binary 0/1 response, and then the problem of collinearity is handled by PLS, no need to eliminate variables.

--
Paige Miller

View solution in original post

6 REPLIES 6
stat_sas
Ammonite | Level 13

Hi,

 

You can use proportion or binary variable in proc reg to check collinearity using VIF. Because this does not involve dependent variable in collinearity check. Once you have identified the highly correlated variable then use logistic regression model for further analysis. 

PaigeMiller
Diamond | Level 26

@marcel wrote:

Hello all,

 

I understand that I can check collinearity for logistic regreesion by using Porc REG. I hit a snag when trying to do it. My response for a logistci regression is coded as # event / # trials. It turns out that proc REG does not accept this type of format for the response. So I decided to use the proportion as the dependent variable to check for collinearity in PROC REG.

 

The problem is that when I run the logistic regression with the dependent variable coded as #event/#trials i get totally diffrente results (paramater estimates, SE, p-values) from the logistic when compared to the logistic run with the dependent variable coded as the proportion. I want to keep the # event / # trials as the format for my response for the logistic. Can I use the collinearity test obtained from PROC REG by using the proportion as dependent, given that # event / # trials vs. proportion produce different results?

 

Is there any way to circumvent this problem?


I suspect you get different results because of different sample sizes in each cell. You can (and should) use the events / trials form of the PROC LOGISTIC MODEL statement. In PROC REG, make the dependent variable a binary 0 or 1, and then replicate the row as many times as #trials. The Y value is not used in checking for collinearity.

 

Another approach entirely is to perform Partial Least Squares Regression with binary 0/1 response, and then the problem of collinearity is handled by PLS, no need to eliminate variables.

--
Paige Miller
marcel
Obsidian | Level 7

PaigeMiller,

 

This is a really clear and very informative answer. Thank you very much for your help.

 

Best regards,

 

Marcel

StatDave
SAS Super FREQ

See the collinearity section of this note. Note that the events/trials syntax can be used in PROC GENMOD just like in PROC LOGISTIC.

marcel
Obsidian | Level 7

Sir StatDave_sas,

 

The problem I had is that for collinearity diagnostic I have to use PROC REG, as recommened in other SAS notes, which does not accept the  #event/#trial format for the response. I am coding the response as 0, 1, as suggested by PaigeMiller. PROC REG does accept this coding for the response.

 

Thank you for your observation.

 

Regards,

 

Marcel

StatDave
SAS Super FREQ

You can use PROC REG, and that is in fact what is done in the note I referred to.  But as described there, proper evaluation of collinearity in a logistic model requires a weighted analysis in PROC REG. I encourage you to carefully and fully read through the collinearity section. PROC GENMOD is used to produce the necessary weights for a variety of model types. In the case of a logistic model, the necessary weights are just p*(1-p), so they could be produced from saving the predicted probabilities from the fitted model followed by a DATA step to compute these weights.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 5801 views
  • 0 likes
  • 4 in conversation