PROC LOGISTIC: Need to reproduce results with clustered or robust stan...

jdserbon · Posted 03-01-2013 01:56 PM

I need to reproduce identical parameter estimates with clustered or robust standard errors. I have not been able to reproduce the results. The logistic procedure is the model I am trying to reproduce by utilizing other PROCS in order to calculate the clustered variance. Based on the literature that I have viewed, I have not been able to find a way to produce clustered or robust standard deviations using the logistic procedure. I have been attempting this for some time and I need a fresh set of eyes. Any recommendations out there?!?

Model need to reproduce with clustered or robust standard errors -
proc logistic data = regdatas;
title 'Measure 5';
class quarter /param = ref ref = first;
model meas_5_num (event = '1')= treated female nonwhite age_at_discharge quarter score_community;
strata prov_name;
where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 &
Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',
'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');
by procedure_group;
run;

I have tried -

proc mixed data = regdatas method=REML empirical;
title 'Measure 5';
class quarter prov_name;
model meas_5_num = treated female nonwhite age_at_discharge quarter score_community / solution influence;
random int / sub=prov_name g gcorr;
where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 &
Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',
'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');
by procedure_group;
run;

proc genmod data = x descending;
class quarter (param=ref ref=first) prov_name;
model meas_5_num = treated female nonwhite age_at_discharge quarter score_community / dist=binomial link=logit noint;
repeated subject=prov_name / type=cs corrw;

where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 &

Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',

'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');

by procedure_group;
run;

proc glimmix data = x empirical=classical;
class quarter prov_name;
model meas_5_num = treated female nonwhite age_at_discharge quarter score_community / dist=binomial solution;
random intercept / subject=prov_name;

where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 &

Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',

'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');

by procedure_group;
run;

Reeza · Posted 03-01-2013 02:14 PM

jdserbon wrote:

I need to reproduce identical parameter estimates with clustered or robust standard errors. I have not been able to reproduce the results.

What do you mean by clustered or robust standard errors? Does your data have clusters?

If so, try proc surveylogistic instead.

jdserbon · Posted 03-04-2013 07:14 AM

I have tried the surveylogistic procedure to no avail -

proc surveylogistic data = x;

title 'Measure 5';

class quarter / param=ref ref=first;

model meas_5_num (event='1') = treated female nonwhite age_at_discharge quarter score_community / noint;

strata prov_name;

by procedure_group;

cluster prov_name;

run;

The SEs are to be produced around data that is clustered for prov_name, which is a place of service. There are about 4 of them.

1zmm · Posted 03-04-2013 08:02 AM

Because strata differ from clusters in survey design and analysis, the statement,

STRATA PROV_NAME;

is unnecessary in the PROC SURVEYLOGISTIC program code that you have provided. Use only the CLUSTER statement instead.

In your original PROC LOGISTIC program code, you also included a WHERE statement to include only specific observations that meet the WHERE statement criteria, but you do not include this statement in your PROC SURVEYLOGISTIC code. For PROC SURVEYLOGISTIC, instead of using a WHERE statement, it is preferable in a prior DATA step to down-weigh observations that do NOT meet the criteria in the WHERE statement; thus, you would create for each observation a weight variable whose value would equal 1.00 if the observation meets the WHERE statement criteria and whose value would equal a very small value (say, 0.000000001) if the observation does not meet these criteria. Then, you can include that variable in a WEIGHT statement in PROC SURVEYLOGISTIC.

Finally, PROC SURVEYLOGISTIC calculates subgroup estimates with a DOMAIN statement instead of a BY statement. Thus, use the statement,

DOMAIN procedure_group;

instead of the statement,

BY procedure_group;

to obtain estimates for all procedure groups as well as for each separate procedure group.

jdserbon · Posted 03-04-2013 11:49 AM

As of now, the logistic works perfectly, whereas the SURVEYLOGISTIC is sending messages in the log warning of quasi-serpartion when using

proc surveylogistic data = regdatas;

title 'Measure 5';

cluster prov_name;

class quarter /param=ref ref=first;

model meas_5_num (event= '1')= treated female nonwhite age_at_discharge quarter score_community;

where (prov_ace_crd = 1 or prov_tru_crd = 1) & quarter ne '' & prov_name ne '' & meas_5_denom = 1 &

Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT','CARDIAC PACEMAKER IMPLANT OR REVISION','CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',

'CORONARY ARTERY BYPASS GRAFT','PERCUTANEOUS CORONARY INTERVENTION');

domain procedure_group;

run;

or when using the same code with data set = x; where x is the data set that represents the criteria defined in the WHERE statement which was issued in a previous step.

This is getting insane! These clustered SEs are killing me!

1zmm · Posted 03-04-2013 03:25 PM

The error message from PROC SURVEYLOGISTIC implies that some configuration of your independent variables perfectly predicts your dependent variable groups so that some of the regression coefficients for your independent variables may have zero or infinite estimates.

See if the following works:

* Create a new respondent sampling weight variable so that those who meet the criteria of the former WHERE statement;

* have a value equalling 1.00 and those who do not meet these criteria have a much smaller value of 0.000000001.;

data regdatas2;

set regdatas;

if ((prov_ace_crd = 1 or prov_tru_crd = 1) &

(quarter ne ' ') &

(prov_name ne ' ') &

(meas_5_denom =1) &

(Procedure_group in ('CARDIAC DEFIBRILLATOR IMPLANT',

'CARDIAC PACEMAKER IMPLANT OR REVISION',

'CARDIAC VALVE AND OTHER MAJOR CARDIOTHORACIC',

'CORONARY ARTERY BYPASS GRAFT', 'PERCUTANEOUS CORONARY INTERVENTION')))

then wtvar=1;

else wtvar=0.000000001;

output regdatas2;

run;

* Sort by the cluster;

proc sort data=regdatas2;

by prov_name;

run;

proc surveylogistic data=regdatas2;

title "Measure 5";

cluster prov_name;

class quarter / param=ref ref=first;

model meas_5_num (event='1') = treated female nonwhite age_at_discharge quarter score_community;

domain procedure_group;

weight wtvar;

run;

jdserbon · Posted 03-05-2013 02:21 PM

I appreciate the help; however, nothing has worked up to this point. I have leared a lot of new methods though. The first issue is the parameter estimate for treated is zero for the first few procedure groups. Cannot even get past that point in estimating these clustered SEs. I may try to calculate a form of robust SEs for this issue.

1zmm · Posted 03-05-2013 03:25 PM

The implication is that these procedure groups have no event=1 for your dependent variable, MEAS_5_NUM. Since the WHERE statement may exclude many observations, perhaps this is where the problem lies. If you perform a simple sort by PROCEDURE_GROUP and a subsequent PROC FREQ to tabulate only the variables, TREATED vs. MEAS_5_NUM, by PROCEDURE_GROUP, do you find large numbers with values of MEAS_5_NUM = 1 [=the event of interest] for the PROCEDURE_GROUPs you are interested in? If not, then the WHERE statement criteria may be too restrictive. Your only options then would be to loosen these restrictions or to get more data.

PROC LOGISTIC: Need to reproduce results with clustered or robust standard devs

Re: PROC LOGISTIC: Need to reproduce results with clustered or robust standard devs

Re: PROC LOGISTIC: Need to reproduce results with clustered or robust standard devs

Re: PROC LOGISTIC: Need to reproduce results with clustered or robust standard devs

Re: PROC LOGISTIC: Need to reproduce results with clustered or robust standard devs

Re: PROC LOGISTIC: Need to reproduce results with clustered or robust standard devs

Re: PROC LOGISTIC: Need to reproduce results with clustered or robust standard devs

Re: PROC LOGISTIC: Need to reproduce results with clustered or robust standard devs