Hi,
I ran the following logistic regression for binary classification,
ods graphics on;
proc logistic
Data = work.mdl_base_train_trnf
outmodel=work.mymodel
outest=work.mdl_betas
descending namelen=32;
class &class_var. / param=ref;
model responder = &class_var. &num_var. / lackfit ctable pprob=0.5;
score data=work.mdl_base_train_trnf fitstat out=work.trainpred outroc=work.troc;
score data=work.mdl_base_validate_trnf fitstat out=work.validpred outroc=work.vroc;
score data=work.mdl_base_osd_trnf fitstat out=work.osdpred outroc=work.oroc;
roc;
run;
ods graphics off;
In the output work.trainpred, there are the following fields (I have excluded other fields),
ID Responder F_Responder I_Responder P_1 P_0
1 1 1 1 0.665672289 0.334327711
2 1 1 1 0.997408099 0.002591901
3 1 1 1 0.855185865 0.144814135
4 0 0 0 0.000237562 0.999762438
5 0 0 0 0.000191220 0.999808780
6 1 1 1 0.857405743 0.142594257
7 1 1 1 0.987851783 0.012148217
What do the fields F_Responder, I_Responder, P_1 and P_0 mean? If I want to know what did the model predict i.e. 1 or 0 and what are the probabilities, which fields do I use?
Thanks,
Lobbie
Just expanding the answer a little more--
I_target-name and F_target-name are automatically created classification variables. I_ is for "Into" which means that, given a classification cutoff of .5 by default, this column contains the predicted class level of the target variable.
Your target is RESPONDER, so I_RESPONDER is the level of RESPONDER that the observation is classified into, based on the model.
I know from your output that RESPONDER is coded as 0 and 1, and I am guessing that 1 is the event level. Look at your P_1 values- that is the predicted probability that an observation is a 1. All the cases where P_1 >.5 are I_RESPONDER=1.
F_ stands for "From" and when RESPONDER is coded the way you have it, F_RESPONDER matches the actual variable values of RESPONDER.
I hope this helps!
Cat
Just expanding the answer a little more--
I_target-name and F_target-name are automatically created classification variables. I_ is for "Into" which means that, given a classification cutoff of .5 by default, this column contains the predicted class level of the target variable.
Your target is RESPONDER, so I_RESPONDER is the level of RESPONDER that the observation is classified into, based on the model.
I know from your output that RESPONDER is coded as 0 and 1, and I am guessing that 1 is the event level. Look at your P_1 values- that is the predicted probability that an observation is a 1. All the cases where P_1 >.5 are I_RESPONDER=1.
F_ stands for "From" and when RESPONDER is coded the way you have it, F_RESPONDER matches the actual variable values of RESPONDER.
I hope this helps!
Cat
@CatTruxillo and @Ksharp , thank you both very much for your answers, and Cat's answer is most comprehensive.
I did a check earlier and found that F_Responder is the Actuals and I_Responder is the Predictions. Running Proc Freq with F_Responder * I_Responder means I can create the confusion matrix.
All good and great stuff!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.