Solved: questions about proc glimmix code

astronomy_tower · Posted 07-12-2023 04:07 PM

Hello!

We have data on students who are nested within colleges, and we’re using GLIMMIX to run a multi-level regression model to predict whether or not a student declares a certain major in their first year of college (outcome) based on whether or not they take that subject in high school (main predictor). This is what our GLIMMIX code looks like:

proc glimmix data=&dsn method=laplace noclprint;

class &class_var;

model outcome_var (event=’1’) = &ivlist /

cl dist=binary link=logit solution oddsratio;

random intercept / subject=&subject_var type=vc solution cl;

covtest / wald ;

lsmeans &lsmeans_var / bylevel cl ilink ;

run ;

where:

class_var includes a list of categorical covariates with reference groups specified
ivlist includes the main predictor and all other covariates
subject_var is the college code (second level/grouping variable)
lsmeans_var includes a list of categorical variables like our main predictor, student’s gender, etc.

We borrowed a majority of the syntax from page 4 of this PDF: https://support.sas.com/resources/papers/proceedings15/3430-2015.pdf. We have a couple of questions about understanding some of these options and whether they are appropriate for our situation:

Is there a reason we should use method=laplace rather than the default method=rspl in our case?
Should we use dist=binary because our outcome variable only has 2 outcomes (majored / didn’t major)? In what case would we use dist=binomial instead?
We used the lsmeans statement because we want to get the average predicted probabilities of our outcome for each level of categorical variables in the list lsmeans_var. Does lsmeans assume reference values or grand mean values for all the other categorical covariates in the model when calculating marginal means?
A small percentage of students declare the major, so does that make our sample unbalanced? Does bylevel help with this by calculating lsmeans for each group separately as opposed to using the entire sample size as the denominator when calculating predicted probabilities of lsmeans_var?

Any help would be greatly appreciated!

jiltao · Posted 07-13-2023 07:09 PM

method=laplace is a maximum likelihood based estimation method, which honors the specified distribution. method=rspl ns a pesudo-likelihood estimation method. Both have pros and cons. See the documentation for details. https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_details06.htm
Yes. Using either dist=binary or dist=binomial is fine. If you have event/trials syntax as the dependent variable, you would use dist=binomial (default).
LSMEANS for A is computed averaged across other covariates in the model. You can add the E option in the LSMEANS statement to see exactly how it is computed.
By default LSMEANS are computed over a balanced population, that is, each level receives the same fraction regardless of the sample size in your data. The BYLEVEL option would base the "fraction" on the group sample size in your data. Again, the E option would tell you exactly how that affects the LSMEANS computations. BYLEVEL is explained in the documentation below -https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_syntax13.htm#statug.gl...

Hope this helps,

Jill

View solution in original post

sbxkoenk · Posted 07-13-2023 04:55 PM

I have moved your post to :

Home >> Analytics >> Statistical Procedures

Koen

jiltao · Posted 07-13-2023 07:09 PM

method=laplace is a maximum likelihood based estimation method, which honors the specified distribution. method=rspl ns a pesudo-likelihood estimation method. Both have pros and cons. See the documentation for details. https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_details06.htm
Yes. Using either dist=binary or dist=binomial is fine. If you have event/trials syntax as the dependent variable, you would use dist=binomial (default).
LSMEANS for A is computed averaged across other covariates in the model. You can add the E option in the LSMEANS statement to see exactly how it is computed.
By default LSMEANS are computed over a balanced population, that is, each level receives the same fraction regardless of the sample size in your data. The BYLEVEL option would base the "fraction" on the group sample size in your data. Again, the E option would tell you exactly how that affects the LSMEANS computations. BYLEVEL is explained in the documentation below -https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_syntax13.htm#statug.gl...

Hope this helps,

Jill

questions about proc glimmix code

Re: questions about proc glimmix code

Re: questions about proc glimmix code

Re: questions about proc glimmix code

questions about proc glimmix code

Re: questions about proc glimmix code

Re: questions about proc glimmix code

Re: questions about proc glimmix code

SAS Innovate 2025: Call for Content