BookmarkSubscribeRSS Feed
Rashu
Calcite | Level 5

Can anyone help me on how to plot a calibration curve/line with binary outcome? When I plot the predicted probability vs. the actual outcome I get straight line thru 0 and 1 because of binary outcome. I can't seem to figure out otherwise.

Thanks

19 REPLIES 19
Rick_SAS
SAS Super FREQ

I assume that you have one or more continuous explanatory variables?

I usually use PROC LOGISTIC to model the data and use the PLOTS=EFFECT statement or the newer EFFECTPLOT statement to graph the results:

http://support.sas.com/documentation/cdl/en/statug/63962/HTML/default/viewer.htm#statug_logistic_sec...

If you prefer to do it "by hand," plot the predicted probabilities as Y and the explanatory variable as X.

Rashu
Calcite | Level 5

Rick,

Thank you for you comment. I do have explanatory variables. However, I am not trying to plot the explanatory variable to predicted probability which is what EFFECTPLOT gives. What I am trying to do is plot the predicted probability versus the actual outcome. I already have a model (formula) that I can calculate the predicted probability with but my outcome is binary. So when I plot the predicted versus actual outcome, I get two lines thru o and 1 for binary. I think I need to divided the prob in deciles and plot against the actual frequency and I can seem to figure out how. Please let me know if this makes sense. Thanks for the help.

Rick_SAS
SAS Super FREQ

Unfortunately I am not understanding what you are trying to do. Can you link to a web page that has a picture of what you are trying to achieve? 

Rashu
Calcite | Level 5

Here is the link of the article. It is similar to what I am trying to do. Figure one is what I am trying to get. Thank you.

Rick_SAS
SAS Super FREQ

The link didn't post.

Rick_SAS
SAS Super FREQ

OK. That's clearer. This isn't really a "graph" question, it's a how do I compute the quantities needed for a graph" question.

Here's what you need:
1) Compute the deciles of the predicted probabilities

2) For each decile, compute the mean and upper/lower 95% confidence interval for the observed outcome. The mean is also the "percentage of observed values that are 1."

A "short answer" is that you can do it like this:

/* set deciles of predicted risk */

data Deciles;

set Pred;  /* data that includes variable PredProb for predicted probabilities */

Decile = int(10*PredProb)/10;

run;

proc sgplot data=all;

dot decile / response=y stat=mean limitstat=clm;

run;

This will get you in the ballpark, and would be sufficient for "internal" plots that you intend for yourself or your group.

Unfortunately there are three problems with this approach if you are trying to EXACTLY reproduce the figure in the paper:

1) The DOT statement displays a graph with the deciles on the vertical axis, which is opposite from the graph in the paper.

2) The LIMITSTAT= option computes confidence limits by using the standard formula for normally distributed data. These data are binary, and therefore you should really use CIs for binomial proportions (not a big deal if you have lots of data, but still...)

3) The plot in the paper also overlays a curve which I assume is a nonparametric smoother (for example, a loess curve) through the (Y, PredProb) points.

All of these problems can be surmounted: call PROC FREQ to get the stats and then overlay the SCATTERPLOT / YUPPERLIMIT= YLOWERLIMIT= statement with a LOESS curve.

Rick_SAS
SAS Super FREQ

Oh, and properly you should use PROC RANK to get the deciles instead of the quick approximation that I used here.

bioning
Calcite | Level 5

Dear Rick:

Sorry to bother you! Would you please clarify a couple of points regarding the three differences, as you mentioned, between the graph of interest (Figure 1 in the attachment) and the graph you drew?

1)How would you going to fix the problem of deciles being on the vertical axis instead of the horizontal axis;

2)How exactly would you implement the "overlay the SCATTERPLOT / YUPPERLIMIT= YLOWERLIMIT= statement with a LOESS curve".

I understand that these codes may be super easy to you, but they are actually the bottleneck for me for this question. I googled for one day and found nothing. Thank you! 

Rick_SAS
SAS Super FREQ

If you post sample data, we can make concrete suggestions. But it sounds like you want something like the following. Here I am using PROC SGPLOT, which has a simple syntax: 

 

data Have;
input decile y low hi;
datalines;
1 1 0 2
2 3 1 2
3 4 2 5
4 6 4 7
5 5.5 5 6
6 5 4 6
7 4.5 4 7
8 3 1 5
9 2 1 3
;
proc sgplot data=Have;
scatter x=decile y=Y / YErrorLower=low YErrorUpper=hi;
loess x=decile y=y;
run; 

 

bioning
Calcite | Level 5

Dear Rick:

Thank you so much for the instructions!

Now I'm just one step away from the figure in that article. Below are the codes I learned from you and applied to this question. The dataset was attached in the attachment. The variable phat_mean is the predicted risk by group, and the ob_risk is the observed risk by group. The 10 groups were divided based on the deciles of the predicted risk. My final question is: how to reproduce that dashed diagonal line, which appears to be a reference line, in Figure 1 of that article?

 

proc sgplot data=ning;
scatter x=phat_mean y=ob_risk / YErrorLower=Lower_CI YErrorUpper=Upper_CI;
loess x=phat_mean y=ob_risk;
run;

 

Looking forward to your further instruction!

Many thanks!

Ning

 

Rick_SAS
SAS Super FREQ

Glad you are making progress.  I suggest you also add the NOMARKERS option to the LOESS statement. Then use the LINEPARM statement, like this:

proc sgplot data=ning;
scatter x=phat_mean y=ob_risk / YErrorLower=Lower_CI YErrorUpper=Upper_CI;
loess x=phat_mean y=ob_risk / nomarkers;
lineparm x=0 y=0 slope=1 / lineattrs=(pattern=dashed);
run;

 

bioning
Calcite | Level 5

Dear Rick:

Thank you for your tremendous help!

This "calibration plot" is an question in my assignment of a master-degree university course. To the best of my knowledge, your solution is the only, and THE BEST, resource that is available online as of now. In my homework, I cited your instructions and wrote: "The solution and the codes were developed under the guidance of, and virtually by, Mr. Rick from the SAS Institute. Web link listed below. https://communities.sas.com/t5/forums/replypage/board-id/sas_graph/message-id/11641"

The final graph is enclosed in the attachment for other SAS users' reference. This calibration plot is widely used to illustrate the performance of a "risk prediction model" in the medical land. It compares the predicted risk with the observed risk by level of the predicted risk. Statistically, it is also a visualization of the "Hosmer-Lemeshow test" that examines the extent to which the predicted values produced from the statistical model match the observed values obtained from the real world.

Many thanks again!

Ning 

Rick_SAS
SAS Super FREQ

For future reference, here are two blog posts about calibration curves in SAS:

Calibration plots in SAS  [using loess curves]

Decile calibration plots in SAS

 

The second article includes a comparison of the two methods. It ends with the following recommendation:

"Many leading researchers in logistic regression do not recommend the Hosmer-Lemeshow test for these reasons. The decile-based calibration curve shares the same drawbacks. Since SAS can easily create the loess-based calibration curve (see the previous article), there seems to be little reason to prefer the decile-based version."

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 8696 views
  • 1 like
  • 4 in conversation