BookmarkSubscribeRSS Feed
jdserbon
Calcite | Level 5

I need to reproduce identical parameter estimates with clustered or robust standard errors.  I have not been able to reproduce the results.  The logistic procedure is the model I am trying to reproduce by utilizing other PROCS in order to calculate the clustered variance.  Based on the literature that I have viewed, I have not been able to find a way to produce clustered or robust standard deviations using the logistic procedure.  I have been attempting this for some time and I need a fresh set of eyes.  Any recommendations out there?!?


Model need to reproduce with clustered or robust standard errors -
proc logistic data = regdatas;
  title 'Measure 5';
  class quarter /param = ref ref = first;
  model meas_5_num (event = '1')=  treated female nonwhite age_at_discharge quarter score_community;
  strata prov_name;
  where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 &
  Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',
  'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');
  by procedure_group;
run;

I have tried -

proc mixed data = regdatas method=REML empirical;
  title 'Measure 5';
  class quarter prov_name;
  model meas_5_num =  treated female nonwhite age_at_discharge quarter score_community / solution influence;
  random int / sub=prov_name g gcorr;
  where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 &
  Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',
  'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');
  by procedure_group;
run;


proc genmod data = x descending;
  class quarter (param=ref ref=first) prov_name;
  model meas_5_num = treated female nonwhite age_at_discharge quarter score_community / dist=binomial link=logit noint;
  repeated subject=prov_name / type=cs corrw;

  where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 &

  Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',

  'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');


  by procedure_group;
run;

proc glimmix data = x empirical=classical;
  class quarter prov_name;
  model meas_5_num = treated female nonwhite age_at_discharge quarter score_community / dist=binomial solution;
  random intercept / subject=prov_name;

  where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 & 

  Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',

  'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');


  by procedure_group;
run;

7 REPLIES 7
Reeza
Super User

jdserbon wrote:

I need to reproduce identical parameter estimates with clustered or robust standard errors.  I have not been able to reproduce the results. 

What do you mean by clustered or robust standard errors? Does your data have clusters?

If so, try proc surveylogistic instead.

jdserbon
Calcite | Level 5

I have tried the surveylogistic procedure to no avail -

proc surveylogistic data = x;

  title 'Measure 5';

  class quarter / param=ref ref=first;

  model meas_5_num (event='1') = treated female nonwhite age_at_discharge quarter score_community / noint;

  strata prov_name;

  by procedure_group;

  cluster prov_name;

run;

The SEs are to be produced around data that is clustered for prov_name, which is a place of service.  There are about 4 of them.

1zmm
Quartz | Level 8

Because strata differ from clusters in survey design and analysis, the statement,

   STRATA PROV_NAME;

is unnecessary in the PROC SURVEYLOGISTIC program code that you have provided.  Use only the CLUSTER statement instead.

In your original PROC LOGISTIC program code, you also included a WHERE statement to include only specific observations that meet the WHERE statement criteria, but you do not include this statement in your PROC SURVEYLOGISTIC code.  For PROC SURVEYLOGISTIC, instead of using a WHERE statement, it is preferable in a prior DATA step to down-weigh observations that do NOT meet the criteria in the WHERE statement; thus, you would create for each observation a weight variable whose value would equal 1.00 if the observation meets the WHERE statement criteria and whose value would equal a very small value (say, 0.000000001) if the observation does not meet these criteria.  Then, you can include that variable in a WEIGHT statement in PROC SURVEYLOGISTIC.

Finally, PROC SURVEYLOGISTIC calculates subgroup estimates with a DOMAIN statement instead of a BY statement.  Thus, use the statement,

     DOMAIN procedure_group;

instead of the statement,

      BY procedure_group;

to obtain estimates for all procedure groups as well as for each separate procedure group.

jdserbon
Calcite | Level 5

As of now, the logistic works perfectly, whereas the SURVEYLOGISTIC is sending messages in the log warning of quasi-serpartion when using

proc surveylogistic data = regdatas;

  title 'Measure 5';

  cluster prov_name;

  class quarter /param=ref ref=first;

  model meas_5_num (event= '1')=  treated female nonwhite age_at_discharge quarter score_community;

  where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 &

  Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',

  'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');

  domain procedure_group;

run;

or when using the same code with data set = x; where x is the data set that represents the criteria defined in the WHERE statement which was issued in a previous step.

This is getting insane!  These clustered SEs are killing me!

1zmm
Quartz | Level 8

The error message from PROC SURVEYLOGISTIC implies that some configuration of your independent variables perfectly predicts your dependent variable groups so that some of the regression coefficients for your independent variables may have zero or infinite estimates.

See if the following works:

     * Create a new respondent sampling weight variable so that those who meet the criteria of the former WHERE statement;

     *    have a value equalling 1.00 and those who do not meet these criteria have a much smaller value of 0.000000001.;

     data regdatas2;

          set regdatas;

          if ((prov_ace_crd = 1 or prov_tru_crd = 1) &

               (quarter ne ' ') &

               (prov_name ne ' ') &

               (meas_5_denom =1) &

               (Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT',

                                                    'CARDIAC PACEMAKER IMPLANT OR REVISION',

                                                    'CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',

                                                    'CORONARY ARTERY BYPASS GRAFT', 'PERCUTANEOUS CORONARY INTERVENTION')))

                           then wtvar=1;

                           else wtvar=0.000000001;

         output regdatas2;

    run;

    * Sort by the cluster;

    proc sort data=regdatas2;

        by prov_name;

    run;

     proc surveylogistic data=regdatas2;

         title "Measure 5";

         cluster prov_name;

         class quarter / param=ref ref=first;

         model meas_5_num (event='1') = treated female nonwhite age_at_discharge quarter score_community;

         domain procedure_group;

         weight wtvar;

     run;

jdserbon
Calcite | Level 5

I appreciate the help; however, nothing has worked up to this point.  I have leared a lot of new methods though.  The first issue is the parameter estimate for treated is zero for the first few procedure groups.  Cannot even get past that point in estimating these clustered SEs.  I may try to calculate a form of robust SEs for this issue.

1zmm
Quartz | Level 8

The implication is that these procedure groups have no event=1 for your dependent variable, MEAS_5_NUM.  Since the WHERE statement may exclude many observations, perhaps this is where the problem lies.  If you perform a simple sort by PROCEDURE_GROUP and a subsequent PROC FREQ to tabulate only the variables, TREATED vs. MEAS_5_NUM, by PROCEDURE_GROUP, do you find large numbers with values of MEAS_5_NUM = 1 [=the event of interest] for the PROCEDURE_GROUPs you are interested in? If not, then the WHERE statement criteria may be too restrictive.  Your only options then would be to loosen these restrictions or to get more data.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 3745 views
  • 0 likes
  • 3 in conversation