Solved: Re: How to interpret fields in the prediction data set?

Lobbie · Posted 08-20-2020 04:04 AM

Hi,

I ran the following logistic regression for binary classification,

ods graphics on;
proc logistic 
Data = work.mdl_base_train_trnf 
outmodel=work.mymodel 
outest=work.mdl_betas 
descending namelen=32;
class &class_var. / param=ref;
model responder = &class_var. &num_var. / lackfit ctable pprob=0.5;
score data=work.mdl_base_train_trnf fitstat out=work.trainpred outroc=work.troc;
score data=work.mdl_base_validate_trnf fitstat out=work.validpred outroc=work.vroc;
score data=work.mdl_base_osd_trnf fitstat out=work.osdpred outroc=work.oroc;
roc;
run;
ods graphics off;

In the output work.trainpred, there are the following fields (I have excluded other fields),

ID	Responder	F_Responder	I_Responder	P_1	P_0
1	1	1	1	0.665672289	0.334327711
2	1	1	1	0.997408099	0.002591901
3	1	1	1	0.855185865	0.144814135
4	0	0	0	0.000237562	0.999762438
5	0	0	0	0.000191220	0.999808780
6	1	1	1	0.857405743	0.142594257
7	1	1	1	0.987851783	0.012148217

What do the fields F_Responder, I_Responder, P_1 and P_0 mean? If I want to know what did the model predict i.e. 1 or 0 and what are the probabilities, which fields do I use?

Thanks,

Lobbie

CatTruxillo · Posted 08-20-2020 09:35 AM

Just expanding the answer a little more--

I_target-name and F_target-name are automatically created classification variables. I_ is for "Into" which means that, given a classification cutoff of .5 by default, this column contains the predicted class level of the target variable.

Your target is RESPONDER, so I_RESPONDER is the level of RESPONDER that the observation is classified into, based on the model.

I know from your output that RESPONDER is coded as 0 and 1, and I am guessing that 1 is the event level. Look at your P_1 values- that is the predicted probability that an observation is a 1. All the cases where P_1 >.5 are I_RESPONDER=1.

F_ stands for "From" and when RESPONDER is coded the way you have it, F_RESPONDER matches the actual variable values of RESPONDER.

I hope this helps!

Cat

View solution in original post

Ksharp · Posted 08-20-2020 07:24 AM

"What do the fields F_Responder, I_Responder, P_1 and P_0 mean? "

F_Responder, I_Responder should be CLASS variable in " &class_var.",
P_1 stands for predict probability of "responder=1" . Therefore, P_1 + P_0 =1

CatTruxillo · Posted 08-20-2020 09:35 AM

Just expanding the answer a little more--

I_target-name and F_target-name are automatically created classification variables. I_ is for "Into" which means that, given a classification cutoff of .5 by default, this column contains the predicted class level of the target variable.

Your target is RESPONDER, so I_RESPONDER is the level of RESPONDER that the observation is classified into, based on the model.

I know from your output that RESPONDER is coded as 0 and 1, and I am guessing that 1 is the event level. Look at your P_1 values- that is the predicted probability that an observation is a 1. All the cases where P_1 >.5 are I_RESPONDER=1.

F_ stands for "From" and when RESPONDER is coded the way you have it, F_RESPONDER matches the actual variable values of RESPONDER.

I hope this helps!

Cat

Lobbie · Posted 08-20-2020 09:46 AM

@CatTruxillo and @Ksharp , thank you both very much for your answers, and Cat's answer is most comprehensive.

I did a check earlier and found that F_Responder is the Actuals and I_Responder is the Predictions. Running Proc Freq with F_Responder * I_Responder means I can create the confusion matrix.

All good and great stuff!

How to interpret fields in the prediction data set?

Re: How to interpret fields in the prediction data set?

Re: How to interpret fields in the prediction data set?

Re: How to interpret fields in the prediction data set?

Re: How to interpret fields in the prediction data set?

SAS Innovate 2025: Register Now