BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
slaurett
Fluorite | Level 6

I have been searching for this and cannot find an answer so it may not be possible.

Using PC SAS 9.4

I know that if you run Proc Logistic with the  CTABLE  and PPROB=()  that  you can generate a classification table across several probabilities to use as the cut over between a 0 or 1 outcome.

 

I am trying to figure out how to get Score to score the model using a particular probability as the cut point.  It appears to me that score is always using the 0.5 as the cut point.

 

Is this a parameter into the score portion or do I need to run proc logistic with a pprob set to a single value and that is incorporated into the model output for scoring?

 

I am teaching a class, this came up as a question and I have been racking my brain on this one.

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

The SCORE statement always produces the predicted classification (in the F_response variable) by selecting the level with the maximum predicted probability. For a binary response, this is equivalent to using 0.5 as the cutoff. If you want the predicted classification to use a different cutoff, then simply follow the PROC LOGISTIC step with a DATA step and compute it as desired. For example, if the SCORE statement in your PROC LOGISTIC step produces a scored data set named MyOut, then this DATA step will compute the predicted classifications (in variable Pred) using 0.6 as the cutoff assuming that your response variable is named Y:

 

data MyOut; 
  set MyOut;
  Pred = (P_Y >= 0.6);
  run;

View solution in original post

7 REPLIES 7
unison
Lapis Lazuli | Level 10

pprob option?

 

From the docs: https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_logistic_se...

...

proc logistic data=Screen;
   freq Count;
   model Disease(event='Present')=Test 
         / pevent=.5 .01 ctable pprob=.5;
run;

 ...

-unison

-unison
slaurett
Fluorite | Level 6

I actually have the ctable and pprob options working and can generate a classification table for any particular pprob.

I am looking if there is a way to push new data through using the score option on other than 0.5 and pprob does not seem to do that for me.

StatDave
SAS Super FREQ

The SCORE statement always produces the predicted classification (in the F_response variable) by selecting the level with the maximum predicted probability. For a binary response, this is equivalent to using 0.5 as the cutoff. If you want the predicted classification to use a different cutoff, then simply follow the PROC LOGISTIC step with a DATA step and compute it as desired. For example, if the SCORE statement in your PROC LOGISTIC step produces a scored data set named MyOut, then this DATA step will compute the predicted classifications (in variable Pred) using 0.6 as the cutoff assuming that your response variable is named Y:

 

data MyOut; 
  set MyOut;
  Pred = (P_Y >= 0.6);
  run;
slaurett
Fluorite | Level 6

This is interesting.  Your statement:

"...always produces the predicted classification (in the F_response variable) by selecting the level with the maximum predicted probability."   

Would be the answer.  We have only done binary and 0.5 has been the best but if it turned out that 0.7 gave a higher maximum predicted probability score would predict that cut point.

We have already used the F_ and I_ variable and generated our own predictions using the P_ variable.

 

Thank you.

 

Rick_SAS
SAS Super FREQ

I assume you are using the SCORE statement in PROC LOGISTIC?

 

IMHO, you might consider teaching your students to use PROC PLM. There are many reasons to prefer PROC PLM over PROC SCORE,

 

Regardless, suppose you use the SCORE statement or the STORE statement and PROC PLM to create a score data set. The data set has a variable for the predicted probability of the event. The name will be something like P_Event for the SCORE statement and will be  Predicted for PROC PLM output. You can write a data step that creates a binary variable that contains the predicted class, based on the predicted probability. 

 

For example, the following code uses the Neuralgia data in the PROC LOGISTIC documentation:

 

title 'Logistic Model on Neuralgia';
proc logistic data=Neuralgia;
   class Sex Treatment;
   model Pain(Event='Yes')= Sex Age Duration Treatment;
   score data=NewPatients out=LogiScore ;
   store PainModel / label='Neuralgia Study';  /* or use mylib.PaimModel for permanent storage */
run;

proc plm restore=PainModel;
   score data=NewPatients out=NewScore predicted / ilink; /* ILINK gives probabilities */
run;
 
proc print data=NewScore;
run;

/* Create the Pred_Pain variable, which has values 'Yes' or 'No' depending
   on whether the predicted probability of 'Yes' is greater than the cutoff values */
data ScoreCutpt;
cutpoint = 0.5;
set NewScore;
if Predicted > cutpoint then 
   Pred_Pain = 'Yes';
else Pred_Pain = 'No ';
run;
slaurett
Fluorite | Level 6

Thank you.  I had never heard of Proc PLM.   I will research and see if I can add this next semester.  

 

Rick_SAS
SAS Super FREQ

It has been around since SAS/STAT 9.22, which was released in 2010, so it is almost 10 years old. You can read the documentation or Google

   "proc plm" site:blogs.sas.com/content/iml/

for more information.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1789 views
  • 5 likes
  • 4 in conversation