BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
WLEE
Calcite | Level 5

Dear all,

I am running into a big problem trying to have administrative region
fixed effects and account for cluster robust standard error.
The datset i am using for the research collects data using multiple-
stage sampling.
The sampling of clusters in districts, communes, enumeration areas at
the first stage and then selecting households within each cluster
represents multiple-stage stratified sampling design which is not
perfectly random.
This would underestimate my SE and I would like to have robust
standard error in the model to fix the problem.
The model I run:

proc genmod data=xlucky descending  ;
class districtid(param=ref);
model (Binary Dependent Variable) = (explanatory variables)
/ dist=binary link =logit  ;
repeated subject=districtid/type=cs corrw;
run;

This code give me all the parameter estimates and robust standard
errors.
HOWEVER, when I run:

proc genmod data=xlucky descending  ;
class districtid(param=ref);
model (Binary Dependent Variable) = (explanatory variables districtid)
/ dist=binary link =logit  ;
repeated subject=districtid/type=cs corrw;
run;

To have fixed effect and the RSE, the error massage pops up:
WARNING: The negative of the Hessian is not positive definite. The
convergence is questionable.
WARNING: The procedure is continuing but the validity of the model fit
is questionable.
WARNING: The specified model did not converge.

Any idea how to get this right?
Same problem happens when I run proc glimmix.

Thank you for your help.

WL

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Working from the bottom up:

It looks like there is more than just WEIGHT to consider.  Multistage sampling means looking at the primary sampling rate and total number of primary sampling units.  That gets explained fairly well in the documentation.  For examples with a continuous response variable, check PROC SURVEYREG documentation.  I think examples 90.4 and 90.5 can be converted to SURVEYLOGISTIC as a guide.

On to GLIMMIX.

"Did not converge" can happen a lot of ways.  With no other messages, it may be that you need more iterations or to slightly relax the convergence criteria.  See the NLOPTIONS statement for guidance in these areas.

My opinion is that the R side effects may not be needed.  It might be better to accommodate the multiple stage sampling in G side effects.  The secondary sampling units would have to be specified as a class variable, but not included in the model statement.  Something like:

proc glimmix data =xlucky ;

class districtid secondid;

model binary_dependent_variable (descending) = explanatory_variables districtid

/solution  dist=binary link=logit  ;

random intercept districtid/subject=secondid solution;

run;

But the more I think about this, the more I believe that the SURVEY procs are where you need to be looking.

Steve Denham


View solution in original post

4 REPLIES 4
SteveDenham
Jade | Level 19

The GENMOD error arises, I think, from the use of GEEs to estimate the within cluster variability, when districtid is being used in two ways.  Could you share the GLIMMIX code that gives the same error?  I feel a lot more comfortable commenting on errors in GLIMMIX, as I use it a lot more than GENMOD.

Moving on, and based on some of the info, I may be answering the wrong question here, but have you considered PROC SURVEYLOGISTIC?

Would the following give anything like what you are looking for:

proc surveylogistic data=xlucky ;

class districtid(param=ref);

model binary_dependent_variable (descending) = explanatory_variables districtid;

cluster districtid;

weight <NEED A VARIABLE HERE>;

run;

This would require some sort of weighting variable to reflect the proportions sampled.

This code could be modified to reflect the multiple levels of sampling.

Good luck with this.

Steve Denham

WLEE
Calcite | Level 5

Dear Steve Denham,

Thank you for your very helpful reply.

The Glimmix I fit was:

proc glimmix data =xlucky ;

class districtid ;

model binary_dependent_variable (descending) = explanatory_variables districtid

/solution  dist=binary link=logit  ;

random intercept/subject=districtid;

random _residual_ ;run;

I am not sure to include both G and R random effects in my model, but that was what I did anyway.( I am using a survey that use multiple-staged sampling, do I have base to assume that there are both random effects?)

This code gives me the error:

NOTE: Did not converge.

and gives me no parameter estimates.

Maybe what I should do is to follow your suggestion and use proc surveylogistic to run my regression.

I also have question regarding surveylogistic. That is:

What would happen if I do not include "weight" command in the model?

What is wieght? number of population in each district / total population?

Thank you again for your valuable insights on this.

WL

SteveDenham
Jade | Level 19

Working from the bottom up:

It looks like there is more than just WEIGHT to consider.  Multistage sampling means looking at the primary sampling rate and total number of primary sampling units.  That gets explained fairly well in the documentation.  For examples with a continuous response variable, check PROC SURVEYREG documentation.  I think examples 90.4 and 90.5 can be converted to SURVEYLOGISTIC as a guide.

On to GLIMMIX.

"Did not converge" can happen a lot of ways.  With no other messages, it may be that you need more iterations or to slightly relax the convergence criteria.  See the NLOPTIONS statement for guidance in these areas.

My opinion is that the R side effects may not be needed.  It might be better to accommodate the multiple stage sampling in G side effects.  The secondary sampling units would have to be specified as a class variable, but not included in the model statement.  Something like:

proc glimmix data =xlucky ;

class districtid secondid;

model binary_dependent_variable (descending) = explanatory_variables districtid

/solution  dist=binary link=logit  ;

random intercept districtid/subject=secondid solution;

run;

But the more I think about this, the more I believe that the SURVEY procs are where you need to be looking.

Steve Denham


WLEE
Calcite | Level 5

Thank you again for your comments. It looks like the proc surveylogistic is the way to go.

Thank you for the help.

WL

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 6524 views
  • 3 likes
  • 2 in conversation