Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Proc Logistic Score using other than 0.5 as the cut point.

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-07-2019 03:21 PM
(1995 views)

I have been searching for this and cannot find an answer so it may not be possible.

Using PC SAS 9.4

I know that if you run Proc Logistic with the CTABLE and PPROB=() that you can generate a classification table across several probabilities to use as the cut over between a 0 or 1 outcome.

I am trying to figure out how to get Score to score the model using a particular probability as the cut point. It appears to me that score is always using the 0.5 as the cut point.

Is this a parameter into the score portion or do I need to run proc logistic with a pprob set to a single value and that is incorporated into the model output for scoring?

I am teaching a class, this came up as a question and I have been racking my brain on this one.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The SCORE statement always produces the predicted classification (in the F_*response *variable) by selecting the level with the maximum predicted probability. For a binary response, this is equivalent to using 0.5 as the cutoff. If you want the predicted classification to use a different cutoff, then simply follow the PROC LOGISTIC step with a DATA step and compute it as desired. For example, if the SCORE statement in your PROC LOGISTIC step produces a scored data set named MyOut, then this DATA step will compute the predicted classifications (in variable Pred) using 0.6 as the cutoff assuming that your response variable is named Y:

```
data MyOut;
set MyOut;
Pred = (P_Y >= 0.6);
run;
```

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

pprob option?

From the docs: https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_se...

...

```
proc logistic data=Screen;
freq Count;
model Disease(event='Present')=Test
/ pevent=.5 .01 ctable pprob=.5;
run;
```

...

-unison

-unison

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I actually have the ctable and pprob options working and can generate a classification table for any particular pprob.

I am looking if there is a way to push new data through using the score option on other than 0.5 and pprob does not seem to do that for me.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The SCORE statement always produces the predicted classification (in the F_*response *variable) by selecting the level with the maximum predicted probability. For a binary response, this is equivalent to using 0.5 as the cutoff. If you want the predicted classification to use a different cutoff, then simply follow the PROC LOGISTIC step with a DATA step and compute it as desired. For example, if the SCORE statement in your PROC LOGISTIC step produces a scored data set named MyOut, then this DATA step will compute the predicted classifications (in variable Pred) using 0.6 as the cutoff assuming that your response variable is named Y:

```
data MyOut;
set MyOut;
Pred = (P_Y >= 0.6);
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is interesting. Your statement:

"...always produces the predicted classification (in the F_*response *variable) by selecting the level with the maximum predicted probability."

Would be the answer. We have only done binary and 0.5 has been the best but if it turned out that 0.7 gave a higher maximum predicted probability score would predict that cut point.

We have already used the F_ and I_ variable and generated our own predictions using the P_ variable.

Thank you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I assume you are using the SCORE statement in PROC LOGISTIC?

IMHO, you might consider teaching your students to use PROC PLM. There are many reasons to prefer PROC PLM over PROC SCORE,

Regardless, suppose you use the SCORE statement or the STORE statement and PROC PLM to create a score data set. The data set has a variable for the predicted probability of the event. The name will be something like P_*Event* for the SCORE statement and will be Predicted for PROC PLM output. You can write a data step that creates a binary variable that contains the predicted class, based on the predicted probability.

For example, the following code uses the Neuralgia data in the PROC LOGISTIC documentation:

```
title 'Logistic Model on Neuralgia';
proc logistic data=Neuralgia;
class Sex Treatment;
model Pain(Event='Yes')= Sex Age Duration Treatment;
score data=NewPatients out=LogiScore ;
store PainModel / label='Neuralgia Study'; /* or use mylib.PaimModel for permanent storage */
run;
proc plm restore=PainModel;
score data=NewPatients out=NewScore predicted / ilink; /* ILINK gives probabilities */
run;
proc print data=NewScore;
run;
/* Create the Pred_Pain variable, which has values 'Yes' or 'No' depending
on whether the predicted probability of 'Yes' is greater than the cutoff values */
data ScoreCutpt;
cutpoint = 0.5;
set NewScore;
if Predicted > cutpoint then
Pred_Pain = 'Yes';
else Pred_Pain = 'No ';
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you. I had never heard of Proc PLM. I will research and see if I can add this next semester.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It has been around since SAS/STAT 9.22, which was released in 2010, so it is almost 10 years old. You can read the documentation or Google

"proc plm" site:blogs.sas.com/content/iml/

for more information.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.