Solved: questions about proc glimmix options

astronomy_tower · Posted 07-12-2023 11:54 AM

Hello!

We have data on students who are nested within colleges, and we’re using GLIMMIX to run a multi-level regression model to predict whether or not a student declares a certain major in their first year of college (outcome) based on whether or not they take that subject in high school (main predictor). This is what our GLIMMIX code looks like:

proc glimmix data=&dsn method=laplace noclprint;

class &class_var;

model outcome_var (event=’1’) = &ivlist /

cl dist=binary link=logit solution oddsratio;

random intercept / subject=&subject_var type=vc solution cl;

covtest / wald ;

lsmeans &lsmeans_var / bylevel cl ilink ;

run ;

where:

class_var includes a list of categorical covariates with reference groups specified
ivlist includes the main predictor and all other covariates
subject_var is the college code (second level/grouping variable)
lsmeans_var includes a list of categorical variables like our main predictor, student’s gender, etc.

We borrowed a majority of the syntax from page 4 of this PDF: https://support.sas.com/resources/papers/proceedings15/3430-2015.pdf. We have a couple of questions about understanding some of these options and whether they are appropriate for our situation:

Is there a reason we should use method=laplace rather than the default method=rspl in our case?
Should we use dist=binary because our outcome variable only has 2 outcomes (majored / didn’t major)? In what case would we use dist=binomial instead?
We used the lsmeans statement because we want to get the average predicted probabilities of our outcome for each level of categorical variables in the list lsmeans_var. Does lsmeans assume reference values or grand mean values for all the other categorical covariates in the model when calculating marginal means?
A small percentage of students declare the major, so does that make our sample unbalanced? Does bylevel help with this by calculating lsmeans for each group separately as opposed to using the entire sample size as the denominator when calculating predicted probabilities of lsmeans_var?

Any help would be greatly appreciated!

jiltao · Posted 07-14-2023 09:44 AM

I posted a response yesterday. Not sure why it did not show up. Here it is again --

method=laplace is a maximum likelihood estimation method that honors the distribution assumption. method=rspl is a pesudo-likelihood estimation method that creates a linearized pseudo-response. Both methods have pros and cons. For a binary response variable, a maximum likelihood based estimation method might be less biased. For more information please refer to the documentation below -- https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_details06.htm
It is okay to use dist=binary or dist=binomial in your case. If your response variable is events/trials, then you must usewe use dist=binomial (which is also the default).
lsmeans are computed across the average of other covariates in your model. You can add the E option in the LSMEANS statement to see exactly how it is computed.
By default, LSMEANS are computed over a balanced population. So each level in the group receives the same "fractions" regardless of the sample size in the group. the BYLEVEL option changes the "fractions" based on the group sample size. Again, you can add the E option in the LSMENAS statement to see exactly how that affects the computation of the LSMEANS for your data.

Hope this helps,

Jill

View solution in original post

sbxkoenk · Posted 07-13-2023 04:57 PM

I have moved your post to :

Home >> Analytics >> Statistical Procedures

But I wonder if there's any difference with your other question (questions about proc glimmix code).

Koen

jiltao · Posted 07-14-2023 09:44 AM

I posted a response yesterday. Not sure why it did not show up. Here it is again --

method=laplace is a maximum likelihood estimation method that honors the distribution assumption. method=rspl is a pesudo-likelihood estimation method that creates a linearized pseudo-response. Both methods have pros and cons. For a binary response variable, a maximum likelihood based estimation method might be less biased. For more information please refer to the documentation below -- https://go.documentation.sas.com/doc/en/pgmsascdc/v_040/statug/statug_glimmix_details06.htm
It is okay to use dist=binary or dist=binomial in your case. If your response variable is events/trials, then you must usewe use dist=binomial (which is also the default).
lsmeans are computed across the average of other covariates in your model. You can add the E option in the LSMEANS statement to see exactly how it is computed.
By default, LSMEANS are computed over a balanced population. So each level in the group receives the same "fractions" regardless of the sample size in the group. the BYLEVEL option changes the "fractions" based on the group sample size. Again, you can add the E option in the LSMENAS statement to see exactly how that affects the computation of the LSMEANS for your data.

Hope this helps,

Jill

StatsMan · Posted 07-14-2023 10:08 AM

Your response did show up, @jiltao . This is a near-duplicate post.

astronomy_tower · Posted 07-14-2023 11:40 AM

Dear Jill,

Thank you so much, this is very helpful!!

I apologize for making two posts, it's my first time posting and I thought it will be better to post on two sub-forums instead of just one.

sbxkoenk · Posted 07-14-2023 12:03 PM

@astronomy_tower wrote:

I apologize for making two posts, it's my first time posting and I thought it will be better to post on two sub-forums instead of just one.

Look in this sub-forum 😁
Home >> Welcome >> Getting Started

Look at this post by a Community Manager 😁

Community etiquette: The do’s and don’ts of the SAS Support Communities
https://communities.sas.com/t5/Getting-Started/Community-etiquette-The-do-s-and-don-ts-of-the-SAS-Su...

It says (among other things):
Post your question once, in the appropriate forum. Multiple instances of the same question dilutes the answers and causes confusion.

Thanks!
And welcome to the Communities!

Koen

questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

Re: questions about proc glimmix options

The 2025 SAS Hackathon has begun!