Re: PROC GLIMMIX with binary outcome and interpreting estimates

confooseddesi89 · Posted 09-26-2020 03:56 PM

Hello,

I have a dataset with daily diary data (ranging from 1-10 days/rows per participant). Each row has the participant (idnum), the date, whether the participant went to school (0=no, 1=yes), and whether they consumed breakfast (0=no, 1=yes). Below is a sample of my data:

idnum date school breakfast

1000007	10/25/2014	0	1
1000007	10/26/2014	1	1
1000007	10/27/2014	1	1
1000007	10/30/2014	0	1
1000007	10/30/2014	0	1
1000007	11/01/2014	0	0
1000011	08/31/2014	0	1
1000011	09/02/2014	1	1
1000011	09/04/2014	0	1
1000011	09/06/2014	0	1

I want to examine whether participants are more (or less) likely to consume breakfast when they have had school that day, with a random intercept for the participant. Since my outcome is binary, it seems like PROC GLIMMIX is the appropriate procedure. Below is my code (which I suspect is incorrect):

Proc glimmix data=data NOCLPRINT NOITPRINT METHOD= RSPL;
class breakfast school;
model breakfast=school / SOLUTION;
RANDOM Intercept / TYPE=AR(1) Subject=idnum;
Title 'school predicting breakfast';run;

The resulting output says the distribution is multinomial, but it's meant to be binomial. Furthermore, I have trouble interpreting the estimates in the "Solutions for Fixed Effects" (see below).

Solutions for Fixed Effects
Effect	breakfast	school	Estimate	StandardError	DF	t Value	Pr > \|t\|
Intercept	0		-0.9635	0.09599	634	-10.04	<.0001
school		0	-0.3895	0.1001	3414	-3.89	0.0001
school		1	0	.	.	.	.

I would like to produce odds ratios informing me what are the odds of having breakfast if you've gone to school (versus not having gone to school). Could someone assist me in the correct coding, acquiring odds ratios, and interpretation of estimates? Thank you.

PaigeMiller · Posted 09-26-2020 05:58 PM

You can use the ODDSRATIO option in the MODEL statement to compute odds ratios.

When you have categorical variables in a model, such as your variable SCHOOL, SAS by default sets the coefficient of the last value alphabetically (in your case, when SCHOOL=1) to be zero. This is a convention that has been adopted by SAS, and really does no harm, because the predicted values and the model are the same even if some other convention were adopted. I wrote a post with an example here.

But as I point out every time this question is asked, for class variables you really really really really really really really really really don't want to be trying to interpret the coefficients; for class variables you really really really really really really really really want to be interpreting the LSMEANS from the model, which are much easier to interpret.

--
Paige Miller

confooseddesi89 · Posted 09-28-2020 10:17 AM

Hello,

When I use the lsmeans statement (see below):

Proc glimmix data=data NOCLPRINT NOITPRINT METHOD= RSPL;
class breakfast school;
model breakfast=school / SOLUTION;
RANDOM Intercept / TYPE=AR(1) Subject=idnum;

lsmeans school / cl ilink;
Title 'school predicting breakfast';run;

I get the error "ERROR: Least-squares means are not available for the multinomial distribution." My code must be wrong, as there are only two values for both school and breakfast, 0 and 1.

Furthermore, if your predictor is continuous (I have some that are), an LSMEANS statement can't be used. How do I interpret a one-unit change in the x-variable in that case, with a continuous predictor? I'm having some trouble with the "don't interpret the coefficients" mantra, as I was always taught that you could, and should, interpret model coefficients in my graduate-level statistics courses (interpretation depending on the specific test employed) - even with a categorical outcome. (For example, e^b1 is odds ratio in logistic regression.) And with a continuous predictor, the coding of your outcome (which is 0 and which is 1) is more important, I believe.

Thanks.

PaigeMiller · Posted 09-28-2020 10:33 AM

If I understand the error message properly, it seems like you have more than two values (0 and 1) for breakfast.

LSMEANS doesn't have the slightest impact on continuous variables, those should work just fine either with or without class variables in the model.

--
Paige Miller

confooseddesi89 · Posted 09-28-2020 10:51 AM

Hello,

As I mentioned previously, I have only two values for breakfast, 0 and 1. See below.

breakfast	Frequency	Percent	Cumulative	Cumulative
Frequency	Percent
0	1427	28.94	1427	28.94
1	3504	71.06	4931	100
Frequency Missing = 252

The error remains.

Also, regarding using a continuous predictor, according to this documentation from SAS that you referenced, "LS-means can be computed for any effect in the MODEL statement that involves only CLASS variables." If a predictor is continuous, it's not a CLASS variable.

PaigeMiller · Posted 09-28-2020 11:02 AM

MODEL statements can contain both continuous and class variables. Nothing about using LSMEANS prohibits the MODEL statement from having both continuous and class variables. The statement you quote is a prohibition on what can go into the LSMEANS statement, not what can go in the MODEL statement.

I'm not sure why you are still getting this error. Perhaps you need to specify the exact distribution of breakfast in the MODEL statement as DIST=BIN.

--
Paige Miller

confooseddesi89 · Posted 09-28-2020 12:04 PM

Okay, adding DIST=BIN worked. Thanks!

However:

(1) Is there any way to change how SAS codes the reference group? I want it to be 0, not 1. I tried "event='1'" after "breakfast" (model breakfast(event='1')= . . .), which works in logistic regression; the model ran but "1" was still the reference.

Furthermore, I tried the analysis using a continuous predictor, angry_rec, and as I anticipated, I could not use the LSMEANS statement, because angry_rec is not a CLASS variable. After successfully running the school--> breakfast analysis, I discovered that, similar to logistic regression, I can simply use e^beta1 to interpret the change in the odds corresponding with a one-unit increase in the predictor. See an example below from my analysis with school and breakfast. 0.3895 is the beta, and 1.476 is the odds ratio. e^0.3895 = 1.476.

Solutions for Fixed Effects
Effect	school	Estimate	Standard	DF	t Value	Pr > \|t\|
Error
Intercept		0.9635	0.09599	634	10.04	<.0001
school	0	0.3895	0.1001	3414	3.89	0.0001
school	1	0	.	.	.	.

Odds Ratio Estimates
school	_school	Estimate	DF	95% Confidence Limits
0	1	1.476	3414	1.213	1.796

So, interpretation of coefficients is possible if done correctly. In the above example, then, if 0 were the reference, the beta would be -0.647, and the odds ratio would be 0.524 (1 - 0.476), indicating that if someone goes to school, they have 52% the odds of consuming breakfast as someone who does not go to school.

PaigeMiller · Posted 09-28-2020 12:10 PM

(1) Is there any way to change how SAS codes the reference group? I want it to be 0, not 1. I tried "event='1'" after "breakfast" (model breakfast(event='1')= . . .), which works in logistic regression; the model ran but "1" was still the reference.

class school(ref='1');

Furthermore, I tried the analysis using a continuous predictor, angry_rec, and as I anticipated, I could not use the LSMEANS statement

Yes, that's what I said earlier. You can't put continuous predictors in LSMEANS.

So, interpretation of coefficients is possible if done correctly.

Yes, of course it is possible. With CLASS variables, the interpretation is easier for most people with LSMEANS as compared to the model coefficients.

--
Paige Miller

confooseddesi89 · Posted 09-28-2020 12:34 PM

I want to re-reference the outcome, not the predictor. If I try the following:

class breakfast(ref='0');

I get the error:

ERROR: The response variable appears in the CLASS list. This is not consistent with the
selected distribution.

PaigeMiller · Posted 09-28-2020 12:45 PM

Breakfast is the response variable. It does not belong in a CLASS statement.

You have to specify the level you want in the MODEL statement

model breakfast (event='0') = ... ;

--
Paige Miller

confooseddesi89 · Posted 09-28-2020 12:57 PM

Hello,

As I mentioned in a previous post, using "event='0'" did not work - the model ran the same, but the estimates, ORs, etc. were exactly the same.

Using event='0':

Solutions for Fixed Effects
Effect	school	Estimate	Standard	DF	t Value	Pr > \|t\|
Error
Intercept		0.9635	0.09599	634	10.04	<.0001
school	0	0.3895	0.1001	3414	3.89	0.0001
school	1	0	.	.	.	.

Odds Ratio Estimates
school	_school	Estimate	DF	95% Confidence Limits
0	1	1.476	3414	1.213	1.796

Using event='1':

Solutions for Fixed Effects
Effect	school	Estimate	Standard	DF	t Value	Pr > \|t\|
Error
Intercept		0.9635	0.09599	634	10.04	<.0001
school	0	0.3895	0.1001	3414	3.89	0.0001
school	1	0	.	.	.	.

Odds Ratio Estimates
school	_school	Estimate	DF	95% Confidence Limits
0	1	1.476	3414	1.213	1.796

PaigeMiller · Posted 09-28-2020 01:06 PM

I think I should have said

model breakfast (ref='0') = ... ;

--
Paige Miller

confooseddesi89 · Posted 09-28-2020 01:23 PM

I still get the same result using that code.

PaigeMiller · Posted 09-28-2020 01:37 PM

One more guess

Try DIST=BINARY instead of DIST=BIN.

--
Paige Miller

confooseddesi89 · Posted 09-28-2020 02:02 PM

Excellent! This is the code that finally worked (but only for a binary predictor - see below):

Proc glimmix data=FFS NOCLPRINT NOITPRINT METHOD= RSPL; 
class school(ref='0');
model breakfast(ref='0')=school / ODDSRATIO SOLUTION DIST=BINARY; 
RANDOM Intercept / TYPE=AR(1) Subject=idnum; 
Title 'school predicting breakfast';run;

However, I was not able to conduct the analysis with a continuous predictor. I got the error "Did not converge." This error was not present when I used DIST=BIN or DIST=BINOMIAL with the continuous predictor; but using these options for DIST, I am unable to change the reference group for the outcome of breakfast. Any idea why this would not converge with a continuous predictor, and how to rectify this but still be able to change the outcome reference group?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away