08-04-2015 02:16 PM
I have a data set that has people nested in cities and the dependent variable is ordinal.
The person variable is NQ and the city variable is CITY. There are about 4,000 people and 30 cities
There are both individual and city level effects.
proc glimmix data = DSNAME;
class nq city;
model DV = indlevelvar1 indlevelvar2 .... citylevelvar1 citylevelvar2 ....
/dist = mult link = clogit;
random intercept /subject = nq group = city;
and got "model is too large to run in a reasonable time"
but I am not completely clear on whether I should use / subject = city or subject = nq or something else (both ran).
08-04-2015 03:55 PM
I'll try to work from the bottom up here. What you have now would give a separate variance component due to nq for each city. Is that a reasonable approach? I think it would be if you have multiple observations on each individual. Here I am not so sure. I would be inclined to view nq as the "error", and city as an additional variance component. I would try:
proc glimmix data=DSNAME;
class nq city;
model DV(ref='<put something in here that makes sense>' = <fixed effects vector>/dist=mult link=clogit;
If you have lots of data relative to the number of variables being fit, I would consider using METHOD=LAPLACE as well, thus giving a conditional response, and putting you in the position of being able to compare models on their IC values. But with only about 130 people per city, and assuming the DV has 4 levels, you have about 33 records to estimate each level. Could run into stability and quasi-separation problems.
08-04-2015 04:14 PM
Helpful as always. That ran. One thing that worries me is that the df is the same for the city level effects and the person level effects. Is that correct? I know estimating the df in these models is tricky and full of options
08-05-2015 07:24 AM
I know you are expecting a df around 30 (minus fixed effects estimated) for the city level df, so I think the key here is ddfm=bw (between-within) even though this isn't necessarily a repeated measures design. I think the default is ddfm=contain, but since you have observations at the nq level, that is may be why it ends up using the residual df for everything. One way to check would be to estimate the BLUPs for each city by adding the solution option to the RANDOM statement and looking at both specifications.