BookmarkSubscribeRSS Feed
Lijuu
Obsidian | Level 7

Dear Respected:

I have two binary outcomes (y1 and y2) and some predictors (x1, x2, ..., xp). I have tried to do separate analyses using logistic regression for each outcome given other predictors. However, it is not effective for my analysis as the correlation between the two outcomes (y1, y2) need to be considered and fit one model using bivariate logistic regression. 

Dear, how can I fit bivariate logistic regression using SAS that can address the above?

15 REPLIES 15
PaigeMiller
Diamond | Level 26

I'd like to know too. I am not aware of any way in SAS (or other software) to fit a logistic regression with multiple Y variables. In fact, I'm not even aware of a published (paper or textbook) solution for logistic regression with multiple Y variables.

--
Paige Miller
Ksharp
Super User
Maybe you are looking for multiple level logistic regression.
Encoding to make a new Y=1 2 3 ...
1 0 -> 1
0 1 -> 2
1 1 -> 3

Also could try Random Forest , Decision Tree , Netural Network ....
SteveDenham
Jade | Level 19

When in doubt, I always recommend PROC GLIMMIX.  See the example at this link: https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_glimmix_examples08.htm&docsetVers... 

 

This provides a joint analysis of a binomial response variable and a count response variable. It should be possible to do the joint analysis of two binomial variables, following this example.  I particularly like the R-side joint analysis, as it provides an estimate of the correlation between the two response variables.

 

SteveDenahm 

PaigeMiller
Diamond | Level 26

I will give GLIMMIX a try, but I'm not convinced by just looking at the example.

 

Specifically, in the example code we see

 

dist=byobs(dist)

but how would that work if both Y variables are binary?

 

--
Paige Miller
SteveDenham
Jade | Level 19

That is an excellent point @PaigeMiller . It has been a couple years since I tried it, and I recall it being problematic.  However, looking at the last example, I think this might work:

 

proc glimmix data=yourdata;
   class treatment respvar patient;
   model y =respvar treatment*respvar/ dist=binomial;
   random _residual_ / subject=patient type=unr;
run;

This would treat the response variables (respvar would need to be created in the yourdata dataset, and y "stacked") as a repeated measure on each patient.  I shifted to type=unr as I think an investigator might be more interested in the correlation between the measures than the covariance. It gets away from the byobs type which would be identical with a direct use of the example.

 

A conditional model with G side covariance could be also run:

 

proc glimmix data=yourdata method=laplace;
   class treatment respvar patient;
   model y =respvar treatment*respvar/ dist=binomial;
   random respvar / subject=patient type=unr;
  /* or random intercept/subject=patient; */
run;

I don't have a handy dataset to try this out on, and need to get to some email issues.  I think a quick simulated dataset would be in order to test this approach.

 

SteveDenham

 

 

PaigeMiller
Diamond | Level 26

I'll have to give this a try as well. Sure seems like an awful lot of work to obtain a logistic regression with multiple Y values, and I'm starting to think the easiest thing to do would be @Ksharp's suggestion eariler. Maybe there's a reason why SAS has no easy method to do this, and why I have never seen any textbook or published article that covers logistic regression with multiple Y variables.

--
Paige Miller
Lijuu
Obsidian | Level 7
Dear all respected:
Thank you for your help. I will check it out in R, too.

Lijuu
SteveDenham
Jade | Level 19

Here is an analysis of 3 response variables with 3 levels for treatment in 100 patients.  The first part simulates the data, the second provides a PROC GLIMMIX approach:

 

data one;
call streaminit(1);
pt=.2;
pr=.1;
do trt=1 to 3;
	do respvar=1 to 3;
		do patient=1 to 100;
		y=RAND('BERNOULLI', (pt*trt+pr*respvar)) ;
		output;
		end;
	end;
end;
run;

proc glimmix data=one method=quad;
   class trt respvar patient;
   model y =respvar trt*respvar/ dist=binary corrb;
   random intercept/subject=patient; 
   lsmeans trt*respvar/slicediff=respvar ilink;
run;

The output provides the following

Response variable Treatment  Nominal p(0) Model p(0)
1 1 0.7
0.8308
1 2 0.5
0.5000
1 3 0.3
0.2993
2 1 0.6
0.5904
2 2 0.4
0.4096
2 3 0.2
0.1992
3 1 0.5
0.4699
3 2 0.3
0.2792
3 3 0.1
0.1193

 

The coefficient of determination, based on the squared correlation between the nominal values and the model values, is 0.961.

 

I submit that this approach to multivariate logistic regression using PROC GLIMMIX seems appropriate.

PaigeMiller
Diamond | Level 26

Back to the original question from @Lijuu :

 

"However, it is not effective for my analysis as the correlation between the two outcomes (y1, y2) need to be considered and fit one model using bivariate logistic regression."

 

Does this GLIMMIX approach handle the correlation between Y variables? I'm not seeing it.

 

--
Paige Miller
SteveDenham
Jade | Level 19

HI @PaigeMiller ,

 

An explicit correlation can be fit using this set of statements:

proc glimmix data=one method=quad;
   class trt respvar patient;
   model y =respvar trt*respvar/noint dist=binary corrb;
   random intercept/subject=patient; 
   random respvar/subject=patient type=unr;
   lsmeans trt*respvar/slicediff=respvar ilink;
run;

However, the AICC for this model is 1109.54, while the AICC for the model without random respvar is 1073.25. 

 

To get the fixed effect correlation between the dependent variables, conditioned on the model, there is a corrb option in both. For the model with only shared slope, that matrix looks like:

1    
0.1114 1  
0.01082 0.01306 1

 For the model with explicit correlations, the fixed effect correlations were

1    
0.1248 1  
-0.09419 -0.3098 1

And the G side correlations were

1    
-0.04933 1  
-0.00327 0.3108 1

In either case, we can model the correlation between the response variables.  I chose the random slopes model as it had the smaller information criterion, and the chi**2/df was closer to 1.

 

There is a lot of art going on here in the interpretation, but my point is that all of the criteria posed are met in some fashion.

 

SteveDenham

SteveDenham
Jade | Level 19

Much searching through the SAS site reveals that:

Multivariate logistic model Simultaneously models multiple responses, taking into account the correlations among all response functions.
How to fit it: In PROC CATMOD, specify the RESPONSE LOGITS; statement and multiple response variables in the MODEL statement to fit the model by using weighted least squares estimation. For example, these statements simultaneously model logits that are defined separately on three response variables: response logits; model x1*x2*x3 = group; The bivariate probit model can be fit in SAS/ETS PROC QLIM: model y1 y2 = x / discrete;

 

I'm not a frequent CATMOD user, but with the simulated dataset previously presented, here is my attempt:

 

proc catmod data=one;
response logits;
model respvar*trt = y  /corrb design;
run;
quit;

/* or possibly this, after some transposing and renaming  */
proc catmod data=four order=data;
response logits;
model respvar1*respvar2*respvar3 = trt /corrb design noint;
run;
quit;

Both produce output that ought to be interpretable.  The second set of statements appears to me to be closer to what is referred to above; the first set provides an analysis with treatment 3 and respvar 3 as the reference category.(categories?).

 

All in all, I happen to think that the GLIMMIX approach is easier to understand, but I have about 10,000 times more experience with GLIMMIX...

😏

 

SteveDenham

 

StatDave
SAS Super FREQ

Yes, that is the correct MODEL statement in CATMOD for simultaneously modeling three response variables with TRT as the predictor.

model respvar1*respvar2*respvar3 = trt

 

PaigeMiller
Diamond | Level 26

I will have to try this too.

--
Paige Miller
Lijuu
Obsidian | Level 7

Thank you very much. But for me, still it is not working. I have sixteen predictors and two responses.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 2506 views
  • 17 likes
  • 5 in conversation