I have used proc logistic and surveylogistic to calculate the AUC.
proc logistic data=input_t2e rocoptions(weighted);
model &target_col.(event="1")=&vars.;
weight weight;
roc;
ods output ROCassociation=_auc_l;
run;
proc surveylogistic data=input_t2e;
model &target_col.(event="1")=&vars.;
weight weight;
ods output Association=_auc_sl;
run;
The procedures outputted different AUC. Why is that?
As I understand both procedures return the same coefficient estimates (although with different standard errors). Logistic scores should be the same, hence AUC also should be equal. However this was not the case when using both procedures.
Also, I'm thinking about calculating AUC manually and comparing it to both AUCs. How would you do that?
Thanks for the help!
Can't help on a manual AUC. Any time you compare results from one of the Survey procs and a non-survey version you are likely to run into this because the internal algorithms for the Survey procs are designed to deal with complex weighted data and don't change just because your data isn't. Any calculation in the Survey proc that uses variability of data is using a different approach to estimates.
That is one reason why there are different options in the survey proc statement that allows you describe some of the options with the sampling frame (Nomcar Rate Total) and statements like Cluster and Strata.
Survey weights are used differently than non-survey weights.
Calling @Rick_SAS
The question in these calculations is whether to include weights when estimating the association statistics such as concordant, discordant, and tied pairs. In the past, PROC LOGISTIC did not use weights at all for these statistics. At some point, the ROCOPTIONS(WEIGHTED) option was added, which forces PROC LOGISTIC to use the WEIGHT variable to estimates these statistics. The formula with and without weights is given in the doc.
It looks like PROC SURVEYLOGISTIC does not use the weight variable to estimate those stats. I do not know whether it was an intentional decision (that is, perhaps it is not appropriate to use survey weights) or whether the procedure just doesn't support that option.
If you omit the ROCOPTIONS(WEIGHTED) option on the PROC LOGISTIC statement, then both procedures give the same estimates:
data Have;
set Sashelp.Class;
One = 1;
W = _N_ / 19;
run;
%let WtVar = Weight;
proc logistic data=Have /*rocoptions(weighted)*/;
model Sex = Height Age;
weight &WtVar;
ods select Association;
run;
proc surveylogistic data=Have;
model Sex = Height Age;
weight &WtVar;
ods select Association;
run;
So it looks to me like if you want to incorporate weights into the AUC, you can use PROC LOGISTIC. However, I am not an expert on survey procedures, so I don't know whether it makes sense for survey weights. Of course, the standard errors and CIs for the AUC from PROC LOGISTIC should not be used if you are using survey weights, as you have already stated.
Thanks. However, for bigger datasets proc logistic without rocoptions(weighted) does not return the same concordant stats as proc surveylogistic. For other datasets the discrepancy is greater.
data have;
set sashelp.margarin;
weight = _n_/20430;
run;
proc logistic data=Have /*rocoptions(weighted)*/; model Choice=LogPrice LogInc FamSize; weight weight; ods select Association; run; proc surveylogistic data=Have; model Choice=LogPrice LogInc FamSize; weight weight; ods select Association; run;
That's confusing because both procedures calculate the same score.
Moreover, I have noticed that surveylogistic procedure will return different concordant stats if there is no weight statement. However the number of pairs is the same.
data have;
set sashelp.margarin;
weight = _n_/20430;
run;
proc surveylogistic data=Have; model Choice=LogPrice LogInc FamSize; weight weight; ods select Association; run; proc surveylogistic data=Have; model Choice=LogPrice LogInc FamSize; /* weight weight;*/ ods select Association; run;
The PROC LOGISTIC doc states that PROC LOGISTIC uses BINWIDTH=0 for computing the association statistics. The PROC SURVEYLOGISTIC doc states that PROC LOGISTIC uses BINWIDTH=0.002 (=1/500) for a binary response. So if you want PROC LOGISTIC to agree with SURVEYLOGISTIC, you should use the BINWIDTH=0.002 option on the MODEL statement:
model Choice=LogPrice LogInc FamSize / binwidth=0.002;
The links above shows the formulas for computing the association statistics, which you can use if you want to compute the statistics "manually," as you said in an earlier post.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: