Hi,
I ran the following logistic regression for binary classification,
ods graphics on;
proc logistic
Data = work.mdl_base_train_trnf
outmodel=work.mymodel
outest=work.mdl_betas
descending namelen=32;
class &class_var. / param=ref;
model responder = &class_var. &num_var. / lackfit ctable pprob=0.5;
score data=work.mdl_base_train_trnf fitstat out=work.trainpred outroc=work.troc;
score data=work.mdl_base_validate_trnf fitstat out=work.validpred outroc=work.vroc;
score data=work.mdl_base_osd_trnf fitstat out=work.osdpred outroc=work.oroc;
roc;
run;
ods graphics off;
In the output work.trainpred, there are the following fields (I have excluded other fields),
ID Responder F_Responder I_Responder P_1 P_0
1 1 1 1 0.665672289 0.334327711
2 1 1 1 0.997408099 0.002591901
3 1 1 1 0.855185865 0.144814135
4 0 0 0 0.000237562 0.999762438
5 0 0 0 0.000191220 0.999808780
6 1 1 1 0.857405743 0.142594257
7 1 1 1 0.987851783 0.012148217
What do the fields F_Responder, I_Responder, P_1 and P_0 mean? If I want to know what did the model predict i.e. 1 or 0 and what are the probabilities, which fields do I use?
Thanks,
Lobbie
Just expanding the answer a little more--
I_target-name and F_target-name are automatically created classification variables. I_ is for "Into" which means that, given a classification cutoff of .5 by default, this column contains the predicted class level of the target variable.
Your target is RESPONDER, so I_RESPONDER is the level of RESPONDER that the observation is classified into, based on the model.
I know from your output that RESPONDER is coded as 0 and 1, and I am guessing that 1 is the event level. Look at your P_1 values- that is the predicted probability that an observation is a 1. All the cases where P_1 >.5 are I_RESPONDER=1.
F_ stands for "From" and when RESPONDER is coded the way you have it, F_RESPONDER matches the actual variable values of RESPONDER.
I hope this helps!
Cat
Just expanding the answer a little more--
I_target-name and F_target-name are automatically created classification variables. I_ is for "Into" which means that, given a classification cutoff of .5 by default, this column contains the predicted class level of the target variable.
Your target is RESPONDER, so I_RESPONDER is the level of RESPONDER that the observation is classified into, based on the model.
I know from your output that RESPONDER is coded as 0 and 1, and I am guessing that 1 is the event level. Look at your P_1 values- that is the predicted probability that an observation is a 1. All the cases where P_1 >.5 are I_RESPONDER=1.
F_ stands for "From" and when RESPONDER is coded the way you have it, F_RESPONDER matches the actual variable values of RESPONDER.
I hope this helps!
Cat
@CatTruxillo and @Ksharp , thank you both very much for your answers, and Cat's answer is most comprehensive.
I did a check earlier and found that F_Responder is the Actuals and I_Responder is the Predictions. Running Proc Freq with F_Responder * I_Responder means I can create the confusion matrix.
All good and great stuff!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.