Solved: Re: using PROC REG to check collinearity for logistic regression with ...

marcel · Posted 09-21-2017 12:27 AM

Hello all,

I understand that I can check collinearity for logistic regreesion by using Porc REG. I hit a snag when trying to do it. My response for a logistci regression is coded as # event / # trials. It turns out that proc REG does not accept this type of format for the response. So I decided to use the proportion as the dependent variable to check for collinearity in PROC REG.

The problem is that when I run the logistic regression with the dependent variable coded as #event/#trials i get totally diffrente results (paramater estimates, SE, p-values) from the logistic when compared to the logistic run with the dependent variable coded as the proportion. I want to keep the # event / # trials as the format for my response for the logistic. Can I use the collinearity test obtained from PROC REG by using the proportion as dependent, given that # event / # trials vs. proportion produce different results?

Is there any way to circumvent this problem?

Regards,

Marcel

PaigeMiller · Posted 09-21-2017 09:00 AM

@marcel wrote:

Hello all,

I understand that I can check collinearity for logistic regreesion by using Porc REG. I hit a snag when trying to do it. My response for a logistci regression is coded as # event / # trials. It turns out that proc REG does not accept this type of format for the response. So I decided to use the proportion as the dependent variable to check for collinearity in PROC REG.

The problem is that when I run the logistic regression with the dependent variable coded as #event/#trials i get totally diffrente results (paramater estimates, SE, p-values) from the logistic when compared to the logistic run with the dependent variable coded as the proportion. I want to keep the # event / # trials as the format for my response for the logistic. Can I use the collinearity test obtained from PROC REG by using the proportion as dependent, given that # event / # trials vs. proportion produce different results?

Is there any way to circumvent this problem?

I suspect you get different results because of different sample sizes in each cell. You can (and should) use the events / trials form of the PROC LOGISTIC MODEL statement. In PROC REG, make the dependent variable a binary 0 or 1, and then replicate the row as many times as #trials. The Y value is not used in checking for collinearity.

Another approach entirely is to perform Partial Least Squares Regression with binary 0/1 response, and then the problem of collinearity is handled by PLS, no need to eliminate variables.

--
Paige Miller

View solution in original post

stat_sas · Posted 09-21-2017 07:32 AM

Hi,

You can use proportion or binary variable in proc reg to check collinearity using VIF. Because this does not involve dependent variable in collinearity check. Once you have identified the highly correlated variable then use logistic regression model for further analysis.

PaigeMiller · Posted 09-21-2017 09:00 AM

@marcel wrote:

Hello all,

I understand that I can check collinearity for logistic regreesion by using Porc REG. I hit a snag when trying to do it. My response for a logistci regression is coded as # event / # trials. It turns out that proc REG does not accept this type of format for the response. So I decided to use the proportion as the dependent variable to check for collinearity in PROC REG.

The problem is that when I run the logistic regression with the dependent variable coded as #event/#trials i get totally diffrente results (paramater estimates, SE, p-values) from the logistic when compared to the logistic run with the dependent variable coded as the proportion. I want to keep the # event / # trials as the format for my response for the logistic. Can I use the collinearity test obtained from PROC REG by using the proportion as dependent, given that # event / # trials vs. proportion produce different results?

Is there any way to circumvent this problem?

I suspect you get different results because of different sample sizes in each cell. You can (and should) use the events / trials form of the PROC LOGISTIC MODEL statement. In PROC REG, make the dependent variable a binary 0 or 1, and then replicate the row as many times as #trials. The Y value is not used in checking for collinearity.

Another approach entirely is to perform Partial Least Squares Regression with binary 0/1 response, and then the problem of collinearity is handled by PLS, no need to eliminate variables.

--
Paige Miller

marcel · Posted 09-21-2017 09:42 AM

PaigeMiller,

This is a really clear and very informative answer. Thank you very much for your help.

Best regards,

Marcel

StatDave · Posted 09-21-2017 10:47 AM

See the collinearity section of this note. Note that the events/trials syntax can be used in PROC GENMOD just like in PROC LOGISTIC.

marcel · Posted 09-21-2017 04:45 PM

Sir StatDave_sas,

The problem I had is that for collinearity diagnostic I have to use PROC REG, as recommened in other SAS notes, which does not accept the #event/#trial format for the response. I am coding the response as 0, 1, as suggested by PaigeMiller. PROC REG does accept this coding for the response.

Thank you for your observation.

Regards,

Marcel

StatDave · Posted 09-22-2017 10:12 AM

You can use PROC REG, and that is in fact what is done in the note I referred to. But as described there, proper evaluation of collinearity in a logistic model requires a weighted analysis in PROC REG. I encourage you to carefully and fully read through the collinearity section. PROC GENMOD is used to produce the necessary weights for a variety of model types. In the case of a logistic model, the necessary weights are just p*(1-p), so they could be produced from saving the predicted probabilities from the fitted model followed by a DATA step to compute these weights.

using PROC REG to check collinearity for logistic regression with #event/#trials dependenet variable

Re: using PROC REG to check collinearity for logistic regression with #event/#trials dependenet vari

Re: using PROC REG to check collinearity for logistic regression with #event/#trials dependenet vari

Re: using PROC REG to check collinearity for logistic regression with #event/#trials dependenet vari

Re: using PROC REG to check collinearity for logistic regression with #event/#trials dependenet vari

Re: using PROC REG to check collinearity for logistic regression with #event/#trials dependenet vari

Re: using PROC REG to check collinearity for logistic regression with #event/#trials dependenet vari

Re: using PROC REG to check collinearity for logistic regression with #event/#trials dependenet vari