Solved: Proc Logistic Score using other than 0.5 as the cut point.

slaurett · Posted 11-07-2019 03:21 PM

I have been searching for this and cannot find an answer so it may not be possible.

Using PC SAS 9.4

I know that if you run Proc Logistic with the CTABLE and PPROB=() that you can generate a classification table across several probabilities to use as the cut over between a 0 or 1 outcome.

I am trying to figure out how to get Score to score the model using a particular probability as the cut point. It appears to me that score is always using the 0.5 as the cut point.

Is this a parameter into the score portion or do I need to run proc logistic with a pprob set to a single value and that is incorporated into the model output for scoring?

I am teaching a class, this came up as a question and I have been racking my brain on this one.

StatDave · Posted 11-08-2019 09:53 AM

The SCORE statement always produces the predicted classification (in the F_response variable) by selecting the level with the maximum predicted probability. For a binary response, this is equivalent to using 0.5 as the cutoff. If you want the predicted classification to use a different cutoff, then simply follow the PROC LOGISTIC step with a DATA step and compute it as desired. For example, if the SCORE statement in your PROC LOGISTIC step produces a scored data set named MyOut, then this DATA step will compute the predicted classifications (in variable Pred) using 0.6 as the cutoff assuming that your response variable is named Y:

data MyOut; 
  set MyOut;
  Pred = (P_Y >= 0.6);
  run;

View solution in original post

unison · Posted 11-07-2019 03:33 PM

pprob option?

From the docs: https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_se...

...

proc logistic data=Screen;
   freq Count;
   model Disease(event='Present')=Test 
         / pevent=.5 .01 ctable pprob=.5;
run;

...

-unison

slaurett · Posted 11-11-2019 01:15 PM

I actually have the ctable and pprob options working and can generate a classification table for any particular pprob.

I am looking if there is a way to push new data through using the score option on other than 0.5 and pprob does not seem to do that for me.

StatDave · Posted 11-08-2019 09:53 AM

The SCORE statement always produces the predicted classification (in the F_response variable) by selecting the level with the maximum predicted probability. For a binary response, this is equivalent to using 0.5 as the cutoff. If you want the predicted classification to use a different cutoff, then simply follow the PROC LOGISTIC step with a DATA step and compute it as desired. For example, if the SCORE statement in your PROC LOGISTIC step produces a scored data set named MyOut, then this DATA step will compute the predicted classifications (in variable Pred) using 0.6 as the cutoff assuming that your response variable is named Y:

data MyOut; 
  set MyOut;
  Pred = (P_Y >= 0.6);
  run;

slaurett · Posted 11-11-2019 01:20 PM

This is interesting. Your statement:

"...always produces the predicted classification (in the F_response variable) by selecting the level with the maximum predicted probability."

Would be the answer. We have only done binary and 0.5 has been the best but if it turned out that 0.7 gave a higher maximum predicted probability score would predict that cut point.

We have already used the F_ and I_ variable and generated our own predictions using the P_ variable.

Thank you.

Rick_SAS · Posted 11-08-2019 02:10 PM

I assume you are using the SCORE statement in PROC LOGISTIC?

IMHO, you might consider teaching your students to use PROC PLM. There are many reasons to prefer PROC PLM over PROC SCORE,

Regardless, suppose you use the SCORE statement or the STORE statement and PROC PLM to create a score data set. The data set has a variable for the predicted probability of the event. The name will be something like P_Event for the SCORE statement and will be Predicted for PROC PLM output. You can write a data step that creates a binary variable that contains the predicted class, based on the predicted probability.

For example, the following code uses the Neuralgia data in the PROC LOGISTIC documentation:

title 'Logistic Model on Neuralgia';
proc logistic data=Neuralgia;
   class Sex Treatment;
   model Pain(Event='Yes')= Sex Age Duration Treatment;
   score data=NewPatients out=LogiScore ;
   store PainModel / label='Neuralgia Study';  /* or use mylib.PaimModel for permanent storage */
run;

proc plm restore=PainModel;
   score data=NewPatients out=NewScore predicted / ilink; /* ILINK gives probabilities */
run;
 
proc print data=NewScore;
run;

/* Create the Pred_Pain variable, which has values 'Yes' or 'No' depending
   on whether the predicted probability of 'Yes' is greater than the cutoff values */
data ScoreCutpt;
cutpoint = 0.5;
set NewScore;
if Predicted > cutpoint then 
   Pred_Pain = 'Yes';
else Pred_Pain = 'No ';
run;

slaurett · Posted 11-11-2019 01:22 PM

Thank you. I had never heard of Proc PLM. I will research and see if I can add this next semester.

Rick_SAS · Posted 11-11-2019 01:37 PM

It has been around since SAS/STAT 9.22, which was released in 2010, so it is almost 10 years old. You can read the documentation or Google

"proc plm" site:blogs.sas.com/content/iml/

for more information.

Proc Logistic Score using other than 0.5 as the cut point.

Re: Proc Logistic Score using other than 0.5 as the cut point.

Re: Proc Logistic Score using other than 0.5 as the cut point.

Re: Proc Logistic Score using other than 0.5 as the cut point.

Re: Proc Logistic Score using other than 0.5 as the cut point.

Re: Proc Logistic Score using other than 0.5 as the cut point.

Re: Proc Logistic Score using other than 0.5 as the cut point.

Re: Proc Logistic Score using other than 0.5 as the cut point.

Re: Proc Logistic Score using other than 0.5 as the cut point.

Catch up on SAS Innovate 2026