Hello,
I am using proc logistic (binary logit model). I would like to see if I can get the same predicted probability IP_1 values that proc logistic provides, if I do the calculation manually using regression equation.
I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.
I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.
How can I get to IP_1 = 0.000406?
Here is an example with the values of the variables, and coefficients.
var1 | var2 | var3 | var4 | IP_1 | manual calculation |
21 | 1897 | 0 | 0 | 0.000406 | 0.000228 |
Estimates with effect coding
Parameter |
| DF | Estimate |
Intercept |
| 1 | -6.4624 |
var3 | 0 | 1 | 0.6089 |
var4 | 0 | 1 | -0.038 |
var2 |
| 1 | -0.00089 |
var1 |
| 1 | -0.0111 |
Estimates with reference coding
Parameter | DF | Estimate | |
Intercept | 1 | -7.0333 | |
var3 | 0 | 1 | 1.2178 |
var4 | 0 | 1 | -0.076 |
var2 | 1 | -0.00089 | |
var1 | 1 | -0.0111 |
proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;
Thank you.
Zeynep
You've specified the reference level is 1, which means that is coded as 0.
Sounds like you want the opposite, you can just remove the reference specification.
proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect; class var3 (ref='1') var4 (ref='1') / param = ref ; model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100; output out=preds predprobs=individual; run;
Hello @zeynep,
@zeynep wrote:
I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.
I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.
Your formula is correct. There is the LOGISTIC function to simplify the code a bit. But either way I get a result that is close to IP_1 (the remaining rounding error could almost certainly be avoided by using more precise parameter values from an output dataset), regardless of the parameterization of VAR3 and VAR4:
47 data _null_; 48 p=logistic(-7.0333-0.0111*21-0.00089*1897+1.2178-0.076); 49 put p; 50 p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038); 51 put p; 52 run; 0.0004043077 0.0004043077
@zeynep wrote:
var3 and var4 are binary variables and in this example, their values are 0. That's why I multiplied their coefficients with 0 in the equation just like multiplying var1 and var2 coefficients with their values. Can you explain why it's incorrect? What if the values of var3 and var4 were 1, would you still calculate as p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038)?
Remember that also character variables on a nominal scale can be used as categorical predictors. For those it is obvious that in the regression equation their values (such as "Caucasian", "Yes", "Divorced") must be replaced by the numeric values of the corresponding design variables (whose creation is triggered by the CLASS statement). If a predictor is listed in the CLASS statement, its design variables are always used. No exceptions are made for predictors whose original values happen to be numeric.
So, in the case of effect coding of VAR3 and VAR4 with the ref='1' option you would look at the design matrix (cf. table "Class Level Information" in your PROC LOGISTIC output or the generic example in the documentation: "Other Parameterizations", first table). There you would find out that if the values of VAR3 and VAR4 were 1 (i.e., equal to their respective reference levels), the corresponding design variable values in the regression equation would be −1, leading to the calculation p=logistic(-6.4624-0.0111*21-0.00089*1897-0.6089+0.038).
That said, a special feature of binary variables with values 0 and 1 is that they may be treated as continuous variables (by removing them from the CLASS statement), in which case handling them "just like" VAR1 and VAR2 would be correct. The results will be exactly the same as if they are used in the CLASS statement with a parameterization assigning 0 to 0 and 1 to 1 in the design variables. This is not surprising because then the design variable and the original predictor are mathematically equal.
PROC LOGISTIC now has the CODE statement so you can have the scoring code created for you.
Compare your code to that code is one good way to check this process.
And here's a post that illustrates an end to end check for the calculations:
Raw data to run the code is here: (neuralgia data set)
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples02.htm
And post showing example:
Post the Effect Coding matrix please.
Here is the class levels. Does the last column mean if the value is 0 in the data, input them as 1? If that's the case, how can I flip them?
Class Level Information | ||
Class | Value | Design |
Variables | ||
var3 | 0 | 1 |
1 | 0 | |
var4 | 0 | 1 |
1 | 0 |
You've specified the reference level is 1, which means that is coded as 0.
Sounds like you want the opposite, you can just remove the reference specification.
proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect; class var3 (ref='1') var4 (ref='1') / param = ref ; model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100; output out=preds predprobs=individual; run;
Actually I wasn't using the reference specification in my original code and still 0 and 1 were flipped, but if I do (ref='0') that gives the same probabilities with IP_1. Thank you for your help!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.