Hello,
I have a dataset with daily diary data (ranging from 1-10 days/rows per participant). Each row has the participant (idnum), the date, whether the participant went to school (0=no, 1=yes), and whether they consumed breakfast (0=no, 1=yes). Below is a sample of my data:
1000007 | 10/25/2014 | 0 | 1 |
1000007 | 10/26/2014 | 1 | 1 |
1000007 | 10/27/2014 | 1 | 1 |
1000007 | 10/30/2014 | 0 | 1 |
1000007 | 10/30/2014 | 0 | 1 |
1000007 | 11/01/2014 | 0 | 0 |
1000011 | 08/31/2014 | 0 | 1 |
1000011 | 09/02/2014 | 1 | 1 |
1000011 | 09/04/2014 | 0 | 1 |
1000011 | 09/06/2014 | 0 | 1 |
I want to examine whether participants are more (or less) likely to consume breakfast when they have had school that day, with a random intercept for the participant. Since my outcome is binary, it seems like PROC GLIMMIX is the appropriate procedure. Below is my code (which I suspect is incorrect):
Proc glimmix data=data NOCLPRINT NOITPRINT METHOD= RSPL;
class breakfast school;
model breakfast=school / SOLUTION;
RANDOM Intercept / TYPE=AR(1) Subject=idnum;
Title 'school predicting breakfast';run;
The resulting output says the distribution is multinomial, but it's meant to be binomial. Furthermore, I have trouble interpreting the estimates in the "Solutions for Fixed Effects" (see below).
Solutions for Fixed Effects | |||||||
Effect | breakfast | school | Estimate | StandardError | DF | t Value | Pr > |t| |
Intercept | 0 | -0.9635 | 0.09599 | 634 | -10.04 | <.0001 | |
school | 0 | -0.3895 | 0.1001 | 3414 | -3.89 | 0.0001 | |
school | 1 | 0 | . | . | . | . |
I would like to produce odds ratios informing me what are the odds of having breakfast if you've gone to school (versus not having gone to school). Could someone assist me in the correct coding, acquiring odds ratios, and interpretation of estimates? Thank you.
You can use the ODDSRATIO option in the MODEL statement to compute odds ratios.
When you have categorical variables in a model, such as your variable SCHOOL, SAS by default sets the coefficient of the last value alphabetically (in your case, when SCHOOL=1) to be zero. This is a convention that has been adopted by SAS, and really does no harm, because the predicted values and the model are the same even if some other convention were adopted. I wrote a post with an example here.
But as I point out every time this question is asked, for class variables you really really really really really really really really really don't want to be trying to interpret the coefficients; for class variables you really really really really really really really really want to be interpreting the LSMEANS from the model, which are much easier to interpret.
Hello,
When I use the lsmeans statement (see below):
Proc glimmix data=data NOCLPRINT NOITPRINT METHOD= RSPL;
class breakfast school;
model breakfast=school / SOLUTION;
RANDOM Intercept / TYPE=AR(1) Subject=idnum;
lsmeans school / cl ilink;
Title 'school predicting breakfast';run;
I get the error "ERROR: Least-squares means are not available for the multinomial distribution." My code must be wrong, as there are only two values for both school and breakfast, 0 and 1.
Furthermore, if your predictor is continuous (I have some that are), an LSMEANS statement can't be used. How do I interpret a one-unit change in the x-variable in that case, with a continuous predictor? I'm having some trouble with the "don't interpret the coefficients" mantra, as I was always taught that you could, and should, interpret model coefficients in my graduate-level statistics courses (interpretation depending on the specific test employed) - even with a categorical outcome. (For example, e^b1 is odds ratio in logistic regression.) And with a continuous predictor, the coding of your outcome (which is 0 and which is 1) is more important, I believe.
Thanks.
If I understand the error message properly, it seems like you have more than two values (0 and 1) for breakfast.
LSMEANS doesn't have the slightest impact on continuous variables, those should work just fine either with or without class variables in the model.
Hello,
As I mentioned previously, I have only two values for breakfast, 0 and 1. See below.
breakfast | Frequency | Percent | Cumulative | Cumulative |
Frequency | Percent | |||
0 | 1427 | 28.94 | 1427 | 28.94 |
1 | 3504 | 71.06 | 4931 | 100 |
Frequency Missing = 252 |
The error remains.
Also, regarding using a continuous predictor, according to this documentation from SAS that you referenced, "LS-means can be computed for any effect in the MODEL statement that involves only CLASS variables." If a predictor is continuous, it's not a CLASS variable.
MODEL statements can contain both continuous and class variables. Nothing about using LSMEANS prohibits the MODEL statement from having both continuous and class variables. The statement you quote is a prohibition on what can go into the LSMEANS statement, not what can go in the MODEL statement.
I'm not sure why you are still getting this error. Perhaps you need to specify the exact distribution of breakfast in the MODEL statement as DIST=BIN.
Okay, adding DIST=BIN worked. Thanks!
However:
(1) Is there any way to change how SAS codes the reference group? I want it to be 0, not 1. I tried "event='1'" after "breakfast" (model breakfast(event='1')= . . .), which works in logistic regression; the model ran but "1" was still the reference.
Furthermore, I tried the analysis using a continuous predictor, angry_rec, and as I anticipated, I could not use the LSMEANS statement, because angry_rec is not a CLASS variable. After successfully running the school--> breakfast analysis, I discovered that, similar to logistic regression, I can simply use e^beta1 to interpret the change in the odds corresponding with a one-unit increase in the predictor. See an example below from my analysis with school and breakfast. 0.3895 is the beta, and 1.476 is the odds ratio. e^0.3895 = 1.476.
Solutions for Fixed Effects | ||||||
Effect | school | Estimate | Standard | DF | t Value | Pr > |t| |
Error | ||||||
Intercept | 0.9635 | 0.09599 | 634 | 10.04 | <.0001 | |
school | 0 | 0.3895 | 0.1001 | 3414 | 3.89 | 0.0001 |
school | 1 | 0 | . | . | . | . |
Odds Ratio Estimates | |||||
school | _school | Estimate | DF | 95% Confidence Limits | |
0 | 1 | 1.476 | 3414 | 1.213 | 1.796 |
So, interpretation of coefficients is possible if done correctly. In the above example, then, if 0 were the reference, the beta would be -0.647, and the odds ratio would be 0.524 (1 - 0.476), indicating that if someone goes to school, they have 52% the odds of consuming breakfast as someone who does not go to school.
(1) Is there any way to change how SAS codes the reference group? I want it to be 0, not 1. I tried "event='1'" after "breakfast" (model breakfast(event='1')= . . .), which works in logistic regression; the model ran but "1" was still the reference.
class school(ref='1');
Furthermore, I tried the analysis using a continuous predictor, angry_rec, and as I anticipated, I could not use the LSMEANS statement
Yes, that's what I said earlier. You can't put continuous predictors in LSMEANS.
So, interpretation of coefficients is possible if done correctly.
Yes, of course it is possible. With CLASS variables, the interpretation is easier for most people with LSMEANS as compared to the model coefficients.
I want to re-reference the outcome, not the predictor. If I try the following:
class breakfast(ref='0');
I get the error:
ERROR: The response variable appears in the CLASS list. This is not consistent with the
selected distribution.
Breakfast is the response variable. It does not belong in a CLASS statement.
You have to specify the level you want in the MODEL statement
model breakfast (event='0') = ... ;
Hello,
As I mentioned in a previous post, using "event='0'" did not work - the model ran the same, but the estimates, ORs, etc. were exactly the same.
Using event='0':
Solutions for Fixed Effects | ||||||
Effect | school | Estimate | Standard | DF | t Value | Pr > |t| |
Error | ||||||
Intercept | 0.9635 | 0.09599 | 634 | 10.04 | <.0001 | |
school | 0 | 0.3895 | 0.1001 | 3414 | 3.89 | 0.0001 |
school | 1 | 0 | . | . | . | . |
Odds Ratio Estimates | |||||
school | _school | Estimate | DF | 95% Confidence Limits | |
0 | 1 | 1.476 | 3414 | 1.213 | 1.796 |
Using event='1':
Solutions for Fixed Effects | ||||||
Effect | school | Estimate | Standard | DF | t Value | Pr > |t| |
Error | ||||||
Intercept | 0.9635 | 0.09599 | 634 | 10.04 | <.0001 | |
school | 0 | 0.3895 | 0.1001 | 3414 | 3.89 | 0.0001 |
school | 1 | 0 | . | . | . | . |
Odds Ratio Estimates | |||||
school | _school | Estimate | DF | 95% Confidence Limits | |
0 | 1 | 1.476 | 3414 | 1.213 | 1.796 |
I think I should have said
model breakfast (ref='0') = ... ;
One more guess
Try DIST=BINARY instead of DIST=BIN.
Excellent! This is the code that finally worked (but only for a binary predictor - see below):
Proc glimmix data=FFS NOCLPRINT NOITPRINT METHOD= RSPL;
class school(ref='0');
model breakfast(ref='0')=school / ODDSRATIO SOLUTION DIST=BINARY;
RANDOM Intercept / TYPE=AR(1) Subject=idnum;
Title 'school predicting breakfast';run;
However, I was not able to conduct the analysis with a continuous predictor. I got the error "Did not converge." This error was not present when I used DIST=BIN or DIST=BINOMIAL with the continuous predictor; but using these options for DIST, I am unable to change the reference group for the outcome of breakfast. Any idea why this would not converge with a continuous predictor, and how to rectify this but still be able to change the outcome reference group?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.