BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
astronomy_tower
Calcite | Level 5

Hello!

 

We have data on students who are nested within colleges, and we’re using GLIMMIX to run a multi-level regression model to predict whether or not a student declares a certain major in their first year of college (outcome) based on whether or not they take that subject in high school (main predictor). This is what our GLIMMIX code looks like: 

 

proc glimmix data=&dsn method=laplace noclprint;

            class &class_var;

            model outcome_var (event=’1’) = &ivlist /

                cl dist=binary link=logit solution oddsratio;  

            random intercept / subject=&subject_var type=vc solution cl;

            covtest / wald ;

            lsmeans &lsmeans_var / bylevel cl ilink ;

run ;

 

where: 

  • class_var includes a list of categorical covariates with reference groups specified
  • ivlist includes the main predictor and all other covariates
  • subject_var is the college code (second level/grouping variable)
  • lsmeans_var includes a list of categorical variables like our main predictor, student’s gender, etc.

 

We borrowed a majority of the syntax from page 4 of this PDF: https://support.sas.com/resources/papers/proceedings15/3430-2015.pdf. We have a couple of questions about understanding some of these options and whether they are appropriate for our situation: 

 

  1. Is there a reason we should use method=laplace rather than the default method=rspl in our case? 
  2. Should we use dist=binary because our outcome variable only has 2 outcomes (majored / didn’t major)? In what case would we use dist=binomial instead? 
  3. We used the lsmeans statement because we want to get the average predicted probabilities of our outcome for each level of categorical variables in the list lsmeans_var. Does lsmeans assume reference values or grand mean values for all the other categorical covariates in the model when calculating marginal means? 
  4. A small percentage of students declare the major, so does that make our sample unbalanced? Does bylevel help with this by calculating lsmeans for each group separately as opposed to using the entire sample size as the denominator when calculating predicted probabilities of lsmeans_var?

 

Any help would be greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
jiltao
SAS Super FREQ
  1. method=laplace is a maximum likelihood based estimation method, which honors the specified distribution. method=rspl ns a pesudo-likelihood estimation method.  Both have pros and cons. See the documentation for details. https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_details06.htm
  2. Yes. Using either dist=binary or dist=binomial is fine. If you have event/trials syntax as the dependent variable, you would use dist=binomial (default).
  3. LSMEANS for A is computed averaged across other covariates in the model. You can add the E option in the LSMEANS statement to see exactly how it is computed.
  4. By default LSMEANS are computed over a balanced population, that is, each level receives the same fraction regardless of the sample size in your data. The BYLEVEL option would base the "fraction" on the group sample size in your data. Again, the E option would tell you exactly how that affects the LSMEANS computations. BYLEVEL is explained in the documentation below -https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_syntax13.htm#statug.gl...

Hope this helps,

Jill

View solution in original post

2 REPLIES 2
sbxkoenk
SAS Super FREQ

I have moved your post to :

Home >> Analytics >> Statistical Procedures

 

Koen

jiltao
SAS Super FREQ
  1. method=laplace is a maximum likelihood based estimation method, which honors the specified distribution. method=rspl ns a pesudo-likelihood estimation method.  Both have pros and cons. See the documentation for details. https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_details06.htm
  2. Yes. Using either dist=binary or dist=binomial is fine. If you have event/trials syntax as the dependent variable, you would use dist=binomial (default).
  3. LSMEANS for A is computed averaged across other covariates in the model. You can add the E option in the LSMEANS statement to see exactly how it is computed.
  4. By default LSMEANS are computed over a balanced population, that is, each level receives the same fraction regardless of the sample size in your data. The BYLEVEL option would base the "fraction" on the group sample size in your data. Again, the E option would tell you exactly how that affects the LSMEANS computations. BYLEVEL is explained in the documentation below -https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_syntax13.htm#statug.gl...

Hope this helps,

Jill

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 755 views
  • 3 likes
  • 3 in conversation