Hello all - I was wondering if I could pick your collective brain about approximating a subdomain analysis using proc glimmix. I am conducting multilevel modeling of a complex probability survey using proc glimmix. I am able to do an overall analysis just fine, using the weight option to specify weights on my two levels of interest. However, I am unable to figure out how to replicate my analysis for a particular subpopulation within my overall dataset. I know that i can't just subset my dataset, and that proc glimmix DOES have a BY statement option (and not a domain statement) - and i know that using the BY statement is not considered exactly 'proper' from a technical point of view. But I don't know WHY that is the case - Would you mind helping me better understand why using the BY statement option is not appropriate for conducting domain analyses with proc glimmix? Is there anything else you'd recommend I try instead? Your advice, especially if accompanied by references I could consult, would be extremely helpful.
Thanks so much!
What about your project makes the SURVEY procs inappropriate.
The BY group has the exact same reason that the survey procs do with BY group processing:
Note that using a BY statement provides completely separate analyses of the BY groups. It does not provide a statistically valid subpopulation or domain analysis, where the total number of units in the subpopulation is not known with certainty.
(my emphasis)
You should share your existing code attempts and indicate which variables represent which levels of your population.
What about your project makes the SURVEY procs inappropriate.
The BY group has the exact same reason that the survey procs do with BY group processing:
Note that using a BY statement provides completely separate analyses of the BY groups. It does not provide a statistically valid subpopulation or domain analysis, where the total number of units in the subpopulation is not known with certainty.
(my emphasis)
You should share your existing code attempts and indicate which variables represent which levels of your population.
I found the following paper extremely useful when trying to understand the pitfalls of using a BY statement instead of the domain statement for subpopulation analyses of complex survey data. Posting here for others' benefit:
A closer examination of subpopulation analysis of complex-sample survey data:
https://journals.sagepub.com/doi/pdf/10.1177/1536867X0800800404
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.