BookmarkSubscribeRSS Feed
halkyos
Obsidian | Level 7

I am working with risk and protective factor data for outcomes regarding substance use. My data is arranged so that I have an outcome as a binary variable (0=no use, 1= use), the total number of risk factors and the total number of protective factors. Risk factors are known to increase the likelihood of an outcome occurring and protective factors are known to have an opposite effect. Examination in PROC FREQ shows that the proportion of observations using a substance increases with the number of risk factors and decreases with the number of protective factors.  When I use PROC LOGISTIC though to write a model, I am getting a positive effect from my protective factors. Here is my code:

 

PROC LOGISTIC DATA=survey DESCENDING;
  MODEL sub1= rfs pfs;
RUN;

sub1: binary variable where 1= using the substance and 0=not using the substance.

rfs: total number of risk factors.

pfs: total number of protective factors.

 

My results for one substance are giving me a model of p(1)=-3.1860+0.3033(rfs)+0.1181(pfs). As a researcher I know that this is wrong, I don't have anomalous data where the population is more likely to use substances if they have more protective factors, but I am having trouble figuring out how to correct this.

 

24 REPLIES 24
Reeza
Super User
Can you post your log?
And are rfs and pfs continuous or categorical?
halkyos
Obsidian | Level 7

Here is my log:

 

306 PROC LOGISTIC DATA=survey DESCENDING;
307 MODEL sub1=rfs pfs;
308 RUN;

NOTE: PROC LOGISTIC is modeling the probability that sub1=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 14445 observations read from the data set WORK.SURVEY.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.25 seconds
cpu time 0.15 seconds

 

rfs and pfs are positive whole integers unless they are 0 representing the number of risk or protective factors present. rfs ranges from 0-21 and pfs ranges from 0-12.

PaigeMiller
Diamond | Level 26

One reason this can occur is if your two x-variables rfs and pfs are highly correlated with each other. Another reason this can occur is if you have outliers or clusters in rfs and/or pfs.

--
Paige Miller
halkyos
Obsidian | Level 7

I just tested for these possibilities: A chi-square test for independence indicates that rfs and pfs are independent. When entered into PROC AUTOREG for rfs=pfs and pfs=rfs the values are negatively correlated to each other:

 

chi-sq: 4876.6114, p<0.0001.

rfs=12.9714-0.7614(pfs), p<0.0001.

pfs=8.7403-0.2989(rfs), p<0.0001. 

 

The overall distributions are almost textbook normal, and when stratified to whether or not the observation reported substance use the distribution of rfs for non substance users takes on a right-tail skew. All other distributions remain normal.

Reeza
Super User
P<0.0001 means related not indepedent, doesn't it?
PaigeMiller
Diamond | Level 26

@halkyos wrote:

I just tested for these possibilities: A chi-square test for independence indicates that rfs and pfs are independent. When entered into PROC AUTOREG for rfs=pfs and pfs=rfs the values are negatively correlated to each other:

 

chi-sq: 4876.6114, p<0.0001.

rfs=12.9714-0.7614(pfs), p<0.0001.

pfs=8.7403-0.2989(rfs), p<0.0001. 

 

The overall distributions are almost textbook normal, and when stratified to whether or not the observation reported substance use the distribution of rfs for non substance users takes on a right-tail skew. All other distributions remain normal.


What is the correlation (not the auto-correlation from PROC AUTOREG but the correlation from PROC CORR) between rfs and pfs? Distribution of your x-variables is irrelevant here. Are there outliers or clusters among your x-variables?

--
Paige Miller
halkyos
Obsidian | Level 7

My PROC CORR results are as follows:

 

proc corr.PNG

 

There are no high or low outliers for either variable.

Ksharp
Super User

Change your response value which model the prob ,and you get the different result

 

PROC LOGISTIC DATA=survey ;  MODEL sub1(event='0')  = rfs pfs;RUN;

V.S.

PROC LOGISTIC DATA=survey ;  MODEL sub1(event='1')= rfs pfs;RUN;

 

halkyos
Obsidian | Level 7

So I tried this before coming onto here, what is does is switches which of the two has a larger positive coefficient, but both remain positive. My office is renewing my license today so I can't currently give you the exact coefficients, but what happens is it becomes sub1=y+pfs+rfs where the coefficient of pfs> coefficient of rfs; 0<= either coefficient <= 1.

Ksharp
Super User

Did you Check the standard error of these two coefficient ?

halkyos
Obsidian | Level 7

The standard errors are as follows:

 

rfs: 0.0417

pfs: 0.0261

PaigeMiller
Diamond | Level 26

What is the correlation (not the auto-correlation from PROC AUTOREG but the correlation from PROC CORR) between rfs and pfs?

 

Are there outliers or clusters among your x-variables?

--
Paige Miller
Reeza
Super User
Can you show a PROC FREQ of rfs*pfs I suspect you have some massive imbalances.
halkyos
Obsidian | Level 7

PROC CORR is new to me, but looking at the guide on that one it seems pretty straightforward. I ran:

 

PROC CORR DATA=survey;
	VAR rfs pfs;
RUN;

My results are:

 

proc corr.PNG

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 24 replies
  • 2028 views
  • 0 likes
  • 4 in conversation