Fluorite | Level 6

## Manually calculating logistic regression probabilities

Hello,

I am using proc logistic (binary logit model). I would like to see if I can get the same predicted probability IP_1 values that proc logistic provides, if I do the calculation manually using regression equation.

I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.

I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.

How can I get to IP_1 = 0.000406?

Here is an example with the values of the variables, and coefficients.

 var1 var2 var3 var4 IP_1 manual calculation 21 1897 0 0 0.000406 0.000228

Estimates with effect coding

 Parameter DF Estimate Intercept 1 -6.4624 var3 0 1 0.6089 var4 0 1 -0.038 var2 1 -0.00089 var1 1 -0.0111

Estimates with reference coding

 Parameter DF Estimate Intercept 1 -7.0333 var3 0 1 1.2178 var4 0 1 -0.076 var2 1 -0.00089 var1 1 -0.0111

proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;

Thank you.

Zeynep

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Manually calculating logistic regression probabilities

You've specified the reference level is 1, which means that is coded as 0.

Sounds like you want the opposite, you can just remove the reference specification.

proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;
10 REPLIES 10

## Re: Manually calculating logistic regression probabilities

Hello @zeynep,

@zeynep wrote:

I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.

I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.

Your formula is correct. There is the LOGISTIC function to simplify the code a bit. But either way I get a result that is close to IP_1 (the remaining rounding error could almost certainly be avoided by using more precise parameter values from an output dataset), regardless of the parameterization of VAR3 and VAR4:

47    data _null_;
48    p=logistic(-7.0333-0.0111*21-0.00089*1897+1.2178-0.076);
49    put p;
50    p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038);
51    put p;
52    run;

0.0004043077
0.0004043077

Fluorite | Level 6

## Re: Manually calculating logistic regression probabilities

var3 and var4 are binary variables and in this example, their values are 0. That's why I multiplied their coefficients with 0 in the equation just like multiplying var1 and var2 coefficients with their values. Can you explain why it's incorrect? What if the values of var3 and var4 were 1, would you still calculate as p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038)?

## Re: Manually calculating logistic regression probabilities

@zeynep wrote:
var3 and var4 are binary variables and in this example, their values are 0. That's why I multiplied their coefficients with 0 in the equation just like multiplying var1 and var2 coefficients with their values. Can you explain why it's incorrect? What if the values of var3 and var4 were 1, would you still calculate as p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038)?

Remember that also character variables on a nominal scale can be used as categorical predictors. For those it is obvious that in the regression equation their values (such as "Caucasian", "Yes", "Divorced") must be replaced by the numeric values of the corresponding design variables (whose creation is triggered by the CLASS statement). If a predictor is listed in the CLASS statement, its design variables are always used. No exceptions are made for predictors whose original values happen to be numeric.

So, in the case of effect coding of VAR3 and VAR4 with the ref='1' option you would look at the design matrix (cf. table "Class Level Information" in your PROC LOGISTIC output or the generic example in the documentation: "Other Parameterizations", first table). There you would find out that if the values of VAR3 and VAR4 were 1 (i.e., equal to their respective reference levels), the corresponding design variable values in the regression equation would be −1, leading to the calculation p=logistic(-6.4624-0.0111*21-0.00089*1897-0.6089+0.038).

That said, a special feature of binary variables with values 0 and 1 is that they may be treated as continuous variables (by removing them from the CLASS statement), in which case handling them "just like" VAR1 and VAR2 would be correct. The results will be exactly the same as if they are used in the CLASS statement with a parameterization assigning 0 to 0 and 1 to 1 in the design variables. This is not surprising because then the design variable and the original predictor are mathematically equal.

Super User

## Re: Manually calculating logistic regression probabilities

PROC LOGISTIC now has the CODE statement so you can have the scoring code created for you.

Compare your code to that code is one good way to check this process.

And here's a post that illustrates an end to end check for the calculations:

Raw data to run the code is here: (neuralgia data set)

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples02.htm

And post showing example:

https://communities.sas.com/t5/Statistical-Procedures/How-to-determine-logistic-regression-formula-f...

Fluorite | Level 6

## Re: Manually calculating logistic regression probabilities

Thank you @Reeza.
I have just tried the code in the post you shared and I got the same value with my manual calculation (0.000129594) both of which are not close to IP_1. Below are the values. The code inputs 0 for var3 and var4 just like I do in my manual calculation, but the pred prob gets close to IP_1 only if we input 1 just like @FreelanceReinhard did. I am not clear on why we would multiply by 1 instead of 0, when the values of the variables are 0. Any thoughts?

IP_1 myformula
0.000405819 0.000129594
Super User

## Re: Manually calculating logistic regression probabilities

Post the Effect Coding matrix please.

Fluorite | Level 6

## Re: Manually calculating logistic regression probabilities

Here is the class levels. Does the last column mean if the value is 0 in the data, input them as 1? If that's the case, how can I flip them?

 Class Level Information Class Value Design Variables var3 0 1 1 0 var4 0 1 1 0
Super User

## Re: Manually calculating logistic regression probabilities

Note that your value and design code are flipped... 0 maps to 1 and 1 maps to 0.
Super User

## Re: Manually calculating logistic regression probabilities

You've specified the reference level is 1, which means that is coded as 0.

Sounds like you want the opposite, you can just remove the reference specification.

proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;
Fluorite | Level 6

## Re: Manually calculating logistic regression probabilities

Actually I wasn't using the reference specification in my original code and still 0 and 1 were flipped, but if I do (ref='0') that gives the same probabilities with IP_1. Thank you for your help!

Discussion stats
• 10 replies
• 1996 views
• 4 likes
• 3 in conversation