- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am using proc logistic (binary logit model). I would like to see if I can get the same predicted probability IP_1 values that proc logistic provides, if I do the calculation manually using regression equation.
I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.
I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.
How can I get to IP_1 = 0.000406?
Here is an example with the values of the variables, and coefficients.
var1 | var2 | var3 | var4 | IP_1 | manual calculation |
21 | 1897 | 0 | 0 | 0.000406 | 0.000228 |
Estimates with effect coding
Parameter |
| DF | Estimate |
Intercept |
| 1 | -6.4624 |
var3 | 0 | 1 | 0.6089 |
var4 | 0 | 1 | -0.038 |
var2 |
| 1 | -0.00089 |
var1 |
| 1 | -0.0111 |
Estimates with reference coding
Parameter | DF | Estimate | |
Intercept | 1 | -7.0333 | |
var3 | 0 | 1 | 1.2178 |
var4 | 0 | 1 | -0.076 |
var2 | 1 | -0.00089 | |
var1 | 1 | -0.0111 |
proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;
Thank you.
Zeynep
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You've specified the reference level is 1, which means that is coded as 0.
Sounds like you want the opposite, you can just remove the reference specification.
proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect; class var3 (ref='1') var4 (ref='1') / param = ref ; model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100; output out=preds predprobs=individual; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @zeynep,
@zeynep wrote:
I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.
I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.
Your formula is correct. There is the LOGISTIC function to simplify the code a bit. But either way I get a result that is close to IP_1 (the remaining rounding error could almost certainly be avoided by using more precise parameter values from an output dataset), regardless of the parameterization of VAR3 and VAR4:
47 data _null_; 48 p=logistic(-7.0333-0.0111*21-0.00089*1897+1.2178-0.076); 49 put p; 50 p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038); 51 put p; 52 run; 0.0004043077 0.0004043077
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
var3 and var4 are binary variables and in this example, their values are 0. That's why I multiplied their coefficients with 0 in the equation just like multiplying var1 and var2 coefficients with their values. Can you explain why it's incorrect? What if the values of var3 and var4 were 1, would you still calculate as p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@zeynep wrote:
var3 and var4 are binary variables and in this example, their values are 0. That's why I multiplied their coefficients with 0 in the equation just like multiplying var1 and var2 coefficients with their values. Can you explain why it's incorrect? What if the values of var3 and var4 were 1, would you still calculate as p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038)?
Remember that also character variables on a nominal scale can be used as categorical predictors. For those it is obvious that in the regression equation their values (such as "Caucasian", "Yes", "Divorced") must be replaced by the numeric values of the corresponding design variables (whose creation is triggered by the CLASS statement). If a predictor is listed in the CLASS statement, its design variables are always used. No exceptions are made for predictors whose original values happen to be numeric.
So, in the case of effect coding of VAR3 and VAR4 with the ref='1' option you would look at the design matrix (cf. table "Class Level Information" in your PROC LOGISTIC output or the generic example in the documentation: "Other Parameterizations", first table). There you would find out that if the values of VAR3 and VAR4 were 1 (i.e., equal to their respective reference levels), the corresponding design variable values in the regression equation would be −1, leading to the calculation p=logistic(-6.4624-0.0111*21-0.00089*1897-0.6089+0.038).
That said, a special feature of binary variables with values 0 and 1 is that they may be treated as continuous variables (by removing them from the CLASS statement), in which case handling them "just like" VAR1 and VAR2 would be correct. The results will be exactly the same as if they are used in the CLASS statement with a parameterization assigning 0 to 0 and 1 to 1 in the design variables. This is not surprising because then the design variable and the original predictor are mathematically equal.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC LOGISTIC now has the CODE statement so you can have the scoring code created for you.
Compare your code to that code is one good way to check this process.
And here's a post that illustrates an end to end check for the calculations:
Raw data to run the code is here: (neuralgia data set)
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples02.htm
And post showing example:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have just tried the code in the post you shared and I got the same value with my manual calculation (0.000129594) both of which are not close to IP_1. Below are the values. The code inputs 0 for var3 and var4 just like I do in my manual calculation, but the pred prob gets close to IP_1 only if we input 1 just like @FreelanceReinhard did. I am not clear on why we would multiply by 1 instead of 0, when the values of the variables are 0. Any thoughts?
IP_1 myformula
0.000405819 0.000129594
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Post the Effect Coding matrix please.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is the class levels. Does the last column mean if the value is 0 in the data, input them as 1? If that's the case, how can I flip them?
Class Level Information | ||
Class | Value | Design |
Variables | ||
var3 | 0 | 1 |
1 | 0 | |
var4 | 0 | 1 |
1 | 0 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You've specified the reference level is 1, which means that is coded as 0.
Sounds like you want the opposite, you can just remove the reference specification.
proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect; class var3 (ref='1') var4 (ref='1') / param = ref ; model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100; output out=preds predprobs=individual; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Actually I wasn't using the reference specification in my original code and still 0 and 1 were flipped, but if I do (ref='0') that gives the same probabilities with IP_1. Thank you for your help!