BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
astronomy_tower
Calcite | Level 5

Hello!

We have data on students who are nested within colleges, and we’re using GLIMMIX to run a multi-level regression model to predict whether or not a student declares a certain major in their first year of college (outcome) based on whether or not they take that subject in high school (main predictor). This is what our GLIMMIX code looks like: 

 

proc glimmix data=&dsn method=laplace noclprint;

            class &class_var;

            model outcome_var (event=’1’) = &ivlist /

                cl dist=binary link=logit solution oddsratio;  

            random intercept / subject=&subject_var type=vc solution cl;

            covtest / wald ;

            lsmeans &lsmeans_var / bylevel cl ilink ;

run ;

 

where: 

  • class_var includes a list of categorical covariates with reference groups specified
  • ivlist includes the main predictor and all other covariates
  • subject_var is the college code (second level/grouping variable)
  • lsmeans_var includes a list of categorical variables like our main predictor, student’s gender, etc.

 

We borrowed a majority of the syntax from page 4 of this PDF: https://support.sas.com/resources/papers/proceedings15/3430-2015.pdf. We have a couple of questions about understanding some of these options and whether they are appropriate for our situation: 

 

  1. Is there a reason we should use method=laplace rather than the default method=rspl in our case? 
  2. Should we use dist=binary because our outcome variable only has 2 outcomes (majored / didn’t major)? In what case would we use dist=binomial instead? 
  3. We used the lsmeans statement because we want to get the average predicted probabilities of our outcome for each level of categorical variables in the list lsmeans_var. Does lsmeans assume reference values or grand mean values for all the other categorical covariates in the model when calculating marginal means? 
  4. A small percentage of students declare the major, so does that make our sample unbalanced? Does bylevel help with this by calculating lsmeans for each group separately as opposed to using the entire sample size as the denominator when calculating predicted probabilities of lsmeans_var?

 

Any help would be greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
jiltao
SAS Super FREQ

I posted a response yesterday. Not sure why it did not show up. Here it is again --

 

  1. method=laplace is a maximum likelihood estimation method that honors the distribution assumption.  method=rspl is a pesudo-likelihood estimation method that creates a linearized pseudo-response. Both methods have pros and cons. For a binary response variable, a maximum likelihood based estimation method might be less biased. For more information please refer to the documentation below -- https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_details06.htm 
  2. It is okay to use dist=binary or dist=binomial in your case. If your response variable is events/trials, then you must usewe use dist=binomial (which is also the default). 
  3.  lsmeans are computed across the average of other covariates in your model. You can add the E option in the LSMEANS statement to see exactly how it is computed. 
  4. By default, LSMEANS are computed over a balanced population. So each level in the group receives the same "fractions" regardless of the sample size in the group. the BYLEVEL option changes the "fractions" based on the group sample size. Again, you can add the E option in the LSMENAS statement to see exactly how that affects the computation of the LSMEANS for your data.

Hope this helps,

Jill

 

View solution in original post

5 REPLIES 5
sbxkoenk
SAS Super FREQ

I have moved your post to :

Home >> Analytics >> Statistical Procedures

 

But I wonder if there's any difference with your other question (questions about proc glimmix code).

 

Koen

jiltao
SAS Super FREQ

I posted a response yesterday. Not sure why it did not show up. Here it is again --

 

  1. method=laplace is a maximum likelihood estimation method that honors the distribution assumption.  method=rspl is a pesudo-likelihood estimation method that creates a linearized pseudo-response. Both methods have pros and cons. For a binary response variable, a maximum likelihood based estimation method might be less biased. For more information please refer to the documentation below -- https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_details06.htm 
  2. It is okay to use dist=binary or dist=binomial in your case. If your response variable is events/trials, then you must usewe use dist=binomial (which is also the default). 
  3.  lsmeans are computed across the average of other covariates in your model. You can add the E option in the LSMEANS statement to see exactly how it is computed. 
  4. By default, LSMEANS are computed over a balanced population. So each level in the group receives the same "fractions" regardless of the sample size in the group. the BYLEVEL option changes the "fractions" based on the group sample size. Again, you can add the E option in the LSMENAS statement to see exactly how that affects the computation of the LSMEANS for your data.

Hope this helps,

Jill

 

StatsMan
SAS Super FREQ

Your response did show up, @jiltao . This is a near-duplicate post. 

astronomy_tower
Calcite | Level 5

Dear Jill,

Thank you so much, this is very helpful!!

I apologize for making two posts, it's my first time posting and I thought it will be better to post on two sub-forums instead of just one.

 

sbxkoenk
SAS Super FREQ

@astronomy_tower wrote:

I apologize for making two posts, it's my first time posting and I thought it will be better to post on two sub-forums instead of just one.

Look in this sub-forum 😁
Home >> Welcome >> Getting Started

 

Look at this post by a Community Manager 😁

Community etiquette: The do’s and don’ts of the SAS Support Communities
https://communities.sas.com/t5/Getting-Started/Community-etiquette-The-do-s-and-don-ts-of-the-SAS-Su...

 

It says (among other things):
Post your question once, in the appropriate forum. Multiple instances of the same question dilutes the answers and causes confusion.

 

Thanks!
And welcome to the Communities!

Koen

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1624 views
  • 4 likes
  • 4 in conversation