Re: Fixed effects with random effects or clustered standard errors

yuchinher · Posted 11-04-2022 06:10 AM

Hi,

I have a 4-level data structure that looks like the following:

Level 1: Waves

Level 2: Individuals

Level 3: Family

Level 4: Neighbourhood

Level 1 is nested in level 2, level 2 is nested in level 3, and level 3 is nested in level 4.

I would like to have fixed effects at the neighbourhood level (using neighbourhood IDs) with random effects at the family and individual level. I tried both GLIMMIX (random intercept / subject statements) and GENMOD (repeated / subject statements) but it seems that because I have many neighbourhood IDs and a large number of observations, I kept running into errors and memory issues.

I was suggested that, alternatively, I could have fixed effects at the neighbourhood level with a SAS option that correct the standard errors for the fixed parameters for clustering (i.e., family and individual level). But I am not sure how to do this in SAS. In Stata, there is a "vce cluster" option in simple logistic regression procedure. I wonder if there is anything similar in SAS?

I also saw from previous posts/comments to use the empirical statement in GLIMMIX. I tried but I think in my case it is not really useful (you still need the subject statements which seemed to be too complex to handle together with the FE in this case).

Any suggestions on how to model this? Thank you very much!

jiltao · Posted 11-04-2022 09:37 AM

First, what is your memsize? Please run the following code and send us the log --

proc options option=memsize value;

run;

Second, there are ways to write a more numerically efficient PROC GLIMMIX program. What is your current PROC GLIMMIX program?

Thanks,

Jill

yuchinher · Posted 11-04-2022 10:44 AM

Hi Jill,

The memory size is 40 gig.

This is the GLIMMIX code:

proc glimmix data = long method=lapalce noclprint;

class clustervar_l4 clustervar_l3 clustervar_l2;

model event(event='1')= clustervar_l4 dur dur2 age sex income hhmember break / solution ddfm=bw link=logit dist=binary;

random intercept / sub=clustervar_l3;

random intercept / sub=clustervar_l2(clustervar_l3);

parms / lowerb=1e-4,.,1e-4 noiter;

covtest / wald;

run;

Thanks for helping!

jiltao · Posted 11-04-2022 03:27 PM

a couple of suggestions and one question about your code --

suggestion 1. renumber the clustervars so they are truly nested.

For example, if these are your values --

clustervarl2 clustervarl3

1 1

2 1

3 1

4 2

5 2

change the values so they are now:

clustervarl2 clustervarl3

1 1

2 1

3 1

1 2

2 2

suggestion 2: try method=quad(fastquad) option in the PROC GLIMMIX statement. Also try ddfm=residuals in the MODEL statement.

Question 1: I wonder why you have the NOITER option in the PARMS statement, and no parameter values are specified there....

Good luck with these tries!

Jill

yuchinher · Posted 11-08-2022 06:09 AM

Hi Jill,

Thank you for your suggestions! I am indeed having the first clustervar specification. May I ask why the second option will be better for the model to run (why you suggested to change the values)?

I have tried the quad(fastquad) option before but it didn't seem to help that much. I am trying the ddfm=residual method and will let you know!

To be honest, I am not so sure how to use the parms and noiter statements, but the specifications seemed to help the model to run more efficiently?

Do you know if PHREG can be used for discrete-time event history analysis I am having, or can it only be used for continuous time? I am trying to also look for other procedures that can help with what I want...

Many thanks!

Yu-Chin

jiltao · Posted 11-08-2022 10:15 AM

changing the subject values so they are truly nested can reduce the number of levels for clustervar_l2 and therefore reduce the run time. This approach is illustrated in the following usage note --

http://support.sas.com/kb/37057

Fastquad is only helpful when you have hierarchical random effects, which is what you have here. The following usage note might be helpful --

http://support.sas.com/kb/60666

The NOITER option is often used when you know the covariance parameter estimates and therefore do not wish PROC GLIMMIX to estimate it iteratively. You might want to take out your PARMS statement for now.

Hope this helps,

Jill

Fixed effects with random effects or clustered standard errors