Re: PROC LOGISTIC: Positive effect in logistic model where a negative ... - Page 2

Ksharp · Posted 09-28-2019 08:06 AM

Another possible reason is BAD data.

Check the standard error of these two positive estimate coefficient , and see if it was very big .

halkyos · Posted 10-01-2019 11:48 AM

The standard errors on the coefficients in the model are 0.00563 (rfs) and 0.00743 (pfs).

halkyos · Posted 10-01-2019 12:10 PM

Some additional exploration of the that may help us figure this out:

I decided to run four separate models:

Substance use predicted by risk factors only
Substance use predicted by protective factors only
No substance use predicted by risk factors only
No substance use predicted by protective factors only

The results are:

Substance use 'yes'=-2.1091+0.2623(rfs)
Substance use 'yes'=0.5846-0.0937(pfs)
Substance use 'no'=2.1091-0.2623(rfs)
Substance use 'no=-0.5846+0.0937(pfs)

So the second half of those were probably unnecessary since they are just the inverse of their counterparts in the first two of the models. All p-values < 0.0001. Standard error on either pfs is 0.00542 and on either rfs is 0.00478.

This is to show that, when tested independently, these are behaving as expected: increases in number of risk factors increases the probability of using substances, while increases in protective factors reduces this probability. The problem is arising when they are thrown into a model together.

Reeza · Posted 10-01-2019 12:16 PM

This is to show that, when tested independently, these are behaving as expected: increases in number of risk factors increases the probability of using substances, while increases in protective factors reduces this probability. The problem is arising when they are thrown into a model together.

Please show a PROC FREQ of how the variables interact.

As you showed before the proc FREQ returns a p-value of <0.0001 so it seems that they are NOT independent.

Have you tried adding an interaction term?

halkyos · Posted 10-01-2019 12:23 PM

Sorry I saw the request and was having trouble getting it all to show up (13x22 table problems). Hopefully this helps.

I am not familiar with adding interaction terms. Is this just running the model as sub1=rfs pfs rfs*pfs?

(edit: I cropped the images to allow them to be easier to read)

Reeza · Posted 10-01-2019 12:28 PM

Yes that is how your add an interaction term.
I would consider aggregating 18-21 as one term and it being 18+. I also wonder if you shouldn't be treating them as categorical variables, since it's not really a continuous measure.
Try adding the variables into the CLASS statements.

halkyos · Posted 10-01-2019 12:45 PM

Alright adding the interaction term only we get:

sub1=-2.8998+0.2695(rfs)+0.0755(pfs)+(0.00576(rfs*pfs)

I tried putting rfs and pfs as classes and it brought up coefficients for each level of risk and protective factors (and with the rfs*pfs it then did a coefficient for each level of that). It was very messy.

With 18+ as a final category we get:

sub1=-3.1942+0.3049(rfs)+0.1179(pfs)

OR (with effects)

sub1=-2.9301+0.2738(rfs)+0.0787(pfs)+0.00529(rfs*pfs)

One idea that I am considering, could this be an effect of having it by observation? So instead, should I set it up so there is only one row where substance use=a(0 or 1) and risk factors =b (0-21) and protective factors=c (0-12) and then the frequency of responses where those are true? For example, if there are 12 people who said they use the substance who also reported 3 risk factors and 5 protective factors it would read (sub) 1, (rfs) 3, (pfs) 5, (n) 12. Right now each of those 12 people would be a separate observation in the data set.

Reeza · Posted 10-01-2019 12:55 PM

Perhaps its also not true...and that those who have X amount of protective factors are less likely to have risk factors so really only one of those matters. And you're also lumping all factors together so the odds of each being the same weight isn't necessarily true.

PaigeMiller · Posted 10-01-2019 12:45 PM

@halkyos wrote:

Some additional exploration of the that may help us figure this out:

I decided to run four separate models:

Substance use predicted by risk factors only

Substance use predicted by protective factors only

No substance use predicted by risk factors only

No substance use predicted by protective factors only

The results are:

Substance use 'yes'=-2.1091+0.2623(rfs)

Substance use 'yes'=0.5846-0.0937(pfs)

Substance use 'no'=2.1091-0.2623(rfs)

Substance use 'no=-0.5846+0.0937(pfs)

So the second half of those were probably unnecessary since they are just the inverse of their counterparts in the first two of the models. All p-values < 0.0001. Standard error on either pfs is 0.00542 and on either rfs is 0.00478.

This is to show that, when tested independently, these are behaving as expected: increases in number of risk factors increases the probability of using substances, while increases in protective factors reduces this probability. The problem is arising when they are thrown into a model together.

When your x-variables are correlated with one another, this can sometimes cause the "wrong" sign to appear on one or more of your regression coefficients. What about outliers and clusters?

--
Paige Miller

halkyos · Posted 10-01-2019 01:05 PM

I am not seeing any clusters or outliers (this is what I was getting at with the distribution earlier). Taking a closer look, using PROC SGPLOT there are no outliers in the overall data, however when broken down into sub1=0 and sub1=1, there are a few outliers in sub1 for risk factors (rfs>15). There are no other outliers in protective factors for either group.

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

Re: PROC LOGISTIC: Positive effect in logistic model where a negative one should occur

SAS Innovate 2025: Save the Date