Solved: How to have "Fixed Effects" and "Cluster Robust Standard Error" simult...

WLEE · Posted 05-07-2012 04:40 PM

Dear all,

I am running into a big problem trying to have administrative region
fixed effects and account for cluster robust standard error.
The datset i am using for the research collects data using multiple-
stage sampling.
The sampling of clusters in districts, communes, enumeration areas at
the first stage and then selecting households within each cluster
represents multiple-stage stratified sampling design which is not
perfectly random.
This would underestimate my SE and I would like to have robust
standard error in the model to fix the problem.
The model I run:

proc genmod data=xlucky descending ;
class districtid(param=ref);
model (Binary Dependent Variable) = (explanatory variables)
/ dist=binary link =logit ;
repeated subject=districtid/type=cs corrw;
run;

This code give me all the parameter estimates and robust standard
errors.
HOWEVER, when I run:

proc genmod data=xlucky descending ;
class districtid(param=ref);
model (Binary Dependent Variable) = (explanatory variables districtid)
/ dist=binary link =logit ;
repeated subject=districtid/type=cs corrw;
run;

To have fixed effect and the RSE, the error massage pops up:
WARNING: The negative of the Hessian is not positive definite. The
convergence is questionable.
WARNING: The procedure is continuing but the validity of the model fit
is questionable.
WARNING: The specified model did not converge.

Any idea how to get this right?
Same problem happens when I run proc glimmix.

Thank you for your help.

WL

SteveDenham · Posted 05-08-2012 11:12 AM

Working from the bottom up:

It looks like there is more than just WEIGHT to consider. Multistage sampling means looking at the primary sampling rate and total number of primary sampling units. That gets explained fairly well in the documentation. For examples with a continuous response variable, check PROC SURVEYREG documentation. I think examples 90.4 and 90.5 can be converted to SURVEYLOGISTIC as a guide.

On to GLIMMIX.

"Did not converge" can happen a lot of ways. With no other messages, it may be that you need more iterations or to slightly relax the convergence criteria. See the NLOPTIONS statement for guidance in these areas.

My opinion is that the R side effects may not be needed. It might be better to accommodate the multiple stage sampling in G side effects. The secondary sampling units would have to be specified as a class variable, but not included in the model statement. Something like:

proc glimmix data =xlucky ;

class districtid secondid;

model binary_dependent_variable (descending) = explanatory_variables districtid

/solution dist=binary link=logit ;

random intercept districtid/subject=secondid solution;

run;

But the more I think about this, the more I believe that the SURVEY procs are where you need to be looking.

Steve Denham

View solution in original post

SteveDenham · Posted 05-08-2012 07:35 AM

The GENMOD error arises, I think, from the use of GEEs to estimate the within cluster variability, when districtid is being used in two ways. Could you share the GLIMMIX code that gives the same error? I feel a lot more comfortable commenting on errors in GLIMMIX, as I use it a lot more than GENMOD.

Moving on, and based on some of the info, I may be answering the wrong question here, but have you considered PROC SURVEYLOGISTIC?

Would the following give anything like what you are looking for:

proc surveylogistic data=xlucky ;

class districtid(param=ref);

model binary_dependent_variable (descending) = explanatory_variables districtid;

cluster districtid;

weight <NEED A VARIABLE HERE>;

run;

This would require some sort of weighting variable to reflect the proportions sampled.

This code could be modified to reflect the multiple levels of sampling.

Good luck with this.

Steve Denham

WLEE · Posted 05-08-2012 09:51 AM

Dear Steve Denham,

Thank you for your very helpful reply.

The Glimmix I fit was:

proc glimmix data =xlucky ;

class districtid ;

model binary_dependent_variable (descending) = explanatory_variables districtid

/solution dist=binary link=logit ;

random intercept/subject=districtid;

random _residual_ ;run;

I am not sure to include both G and R random effects in my model, but that was what I did anyway.( I am using a survey that use multiple-staged sampling, do I have base to assume that there are both random effects?)

This code gives me the error:

NOTE: Did not converge.

and gives me no parameter estimates.

Maybe what I should do is to follow your suggestion and use proc surveylogistic to run my regression.

I also have question regarding surveylogistic. That is:

What would happen if I do not include "weight" command in the model?

What is wieght? number of population in each district / total population?

Thank you again for your valuable insights on this.

WL

SteveDenham · Posted 05-08-2012 11:12 AM

Working from the bottom up:

It looks like there is more than just WEIGHT to consider. Multistage sampling means looking at the primary sampling rate and total number of primary sampling units. That gets explained fairly well in the documentation. For examples with a continuous response variable, check PROC SURVEYREG documentation. I think examples 90.4 and 90.5 can be converted to SURVEYLOGISTIC as a guide.

On to GLIMMIX.

"Did not converge" can happen a lot of ways. With no other messages, it may be that you need more iterations or to slightly relax the convergence criteria. See the NLOPTIONS statement for guidance in these areas.

My opinion is that the R side effects may not be needed. It might be better to accommodate the multiple stage sampling in G side effects. The secondary sampling units would have to be specified as a class variable, but not included in the model statement. Something like:

proc glimmix data =xlucky ;

class districtid secondid;

model binary_dependent_variable (descending) = explanatory_variables districtid

/solution dist=binary link=logit ;

random intercept districtid/subject=secondid solution;

run;

But the more I think about this, the more I believe that the SURVEY procs are where you need to be looking.

Steve Denham

WLEE · Posted 05-08-2012 03:24 PM

Thank you again for your comments. It looks like the proc surveylogistic is the way to go.

Thank you for the help.

WL

How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Re: How to have "Fixed Effects" and "Cluster Robust Standard Error" simultaneously in Proc Genmod or Proc Glimmix?

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!