BookmarkSubscribeRSS Feed
Jedrek369
Fluorite | Level 6

I have used proc logistic and surveylogistic to calculate the AUC.

proc logistic data=input_t2e rocoptions(weighted);
	model &target_col.(event="1")=&vars.;
	weight weight;
	roc;
	ods output ROCassociation=_auc_l;
run;

proc surveylogistic data=input_t2e;
	model &target_col.(event="1")=&vars.;
	weight weight;
	ods output Association=_auc_sl;
run;

The procedures outputted different AUC. Why is that?

 

As I understand both procedures return the same coefficient estimates (although with different standard errors). Logistic scores should be the same, hence AUC also should be equal. However this was not the case when using both procedures.

 

Also, I'm thinking about calculating AUC manually and comparing it to both AUCs. How would you do that?

 

Thanks for the help!

6 REPLIES 6
ballardw
Super User

Can't help on a manual AUC. Any time you compare results from one of the Survey procs and a non-survey version you are likely to run into this because the internal algorithms for the Survey procs are designed to deal with complex weighted data and don't change just because your data isn't. Any calculation in the Survey proc that uses variability of data is using a different approach to estimates.

 

That is one reason why there are different options in the survey proc statement that allows you describe some of the options with the sampling frame (Nomcar Rate Total) and statements like Cluster and Strata.

PaigeMiller
Diamond | Level 26

Survey weights are used differently than non-survey weights.

--
Paige Miller
Rick_SAS
SAS Super FREQ

The question in these calculations is whether to include weights when estimating the association statistics such as concordant, discordant, and tied pairs.  In the past, PROC LOGISTIC did not use weights at all for these statistics. At some point, the ROCOPTIONS(WEIGHTED) option was added, which forces PROC LOGISTIC to use the WEIGHT variable to estimates these statistics. The formula with and without weights is given in the doc

 

It looks like PROC SURVEYLOGISTIC does not use the weight variable to estimate those stats. I do not know whether it was an intentional decision (that is, perhaps it is not appropriate to use survey weights) or whether the procedure just doesn't support that option.  

 

If you omit the ROCOPTIONS(WEIGHTED) option on the PROC LOGISTIC statement, then both procedures give the same estimates:

data Have;
set Sashelp.Class;
One = 1;
W = _N_ / 19;
run;

%let WtVar = Weight;
proc logistic data=Have /*rocoptions(weighted)*/;
	model Sex = Height Age;
	weight &WtVar;
   ods select Association;
run;

proc surveylogistic data=Have;
	model Sex = Height Age;
	weight &WtVar;
   ods select Association;
run;

So it looks to me like if you want to incorporate weights into the AUC, you can use PROC LOGISTIC. However, I am not an expert on survey procedures, so I don't know whether it makes sense for survey weights. Of course, the standard errors and CIs for the AUC from PROC LOGISTIC should not be used if you are using survey weights, as you have already stated.

Jedrek369
Fluorite | Level 6

Thanks. However, for bigger datasets proc logistic without rocoptions(weighted) does not return the same concordant stats as proc surveylogistic. For other datasets the discrepancy is greater.

 

 

data have; 
set sashelp.margarin;
weight = _n_/20430;
run;

proc logistic data=Have /*rocoptions(weighted)*/; model Choice=LogPrice LogInc FamSize; weight weight; ods select Association; run; proc surveylogistic data=Have; model Choice=LogPrice LogInc FamSize; weight weight; ods select Association; run;

That's confusing because both procedures calculate the same score.

 

Moreover, I have noticed that surveylogistic procedure will return different concordant stats if there is no weight statement. However the number of pairs is the same. 

 

data have; 
set sashelp.margarin;
weight = _n_/20430;
run;

proc surveylogistic data=Have; model Choice=LogPrice LogInc FamSize; weight weight; ods select Association; run; proc surveylogistic data=Have; model Choice=LogPrice LogInc FamSize; /* weight weight;*/ ods select Association; run;

 

 

Rick_SAS
SAS Super FREQ

The PROC LOGISTIC doc states that PROC LOGISTIC uses BINWIDTH=0 for computing the association statistics. The PROC SURVEYLOGISTIC doc states that PROC LOGISTIC uses BINWIDTH=0.002 (=1/500) for a binary response.  So if you want PROC LOGISTIC to agree with SURVEYLOGISTIC, you should use the BINWIDTH=0.002 option on the MODEL statement:

model Choice=LogPrice LogInc FamSize / binwidth=0.002;

 

The links above shows the formulas for computing the association statistics, which you can use if you want to compute the statistics "manually," as you said in an earlier post.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1193 views
  • 6 likes
  • 5 in conversation