BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
zeynep
Fluorite | Level 6

Hello,

I am using proc logistic (binary logit model). I would like to see if I can get the same predicted probability IP_1 values that proc logistic provides, if I do the calculation manually using regression equation.

I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.

I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.

How can I get to IP_1 = 0.000406?

Here is an example with the values of the variables, and coefficients.

 

var1

var2

var3

var4

IP_1

manual calculation

21

1897

0

0

0.000406

0.000228

 

Estimates with effect coding

Parameter

 

DF

Estimate 

Intercept

 

1

-6.4624

var3

0

1

0.6089

var4

0

1

-0.038

var2

 

1

-0.00089

var1

 

1

-0.0111

 

Estimates with reference coding

Parameter DFEstimate
Intercept 1-7.0333
var3011.2178
var401-0.076
var2 1-0.00089
var1 1-0.0111

 

proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;

 

Thank you.

 

Zeynep

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

You've specified the reference level is 1, which means that is coded as 0.

Sounds like you want the opposite, you can just remove the reference specification.

 

proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;

View solution in original post

10 REPLIES 10
FreelanceReinh
Jade | Level 19

Hello @zeynep,


@zeynep wrote:

I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.

I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.


Your formula is correct. There is the LOGISTIC function to simplify the code a bit. But either way I get a result that is close to IP_1 (the remaining rounding error could almost certainly be avoided by using more precise parameter values from an output dataset), regardless of the parameterization of VAR3 and VAR4:

47    data _null_;
48    p=logistic(-7.0333-0.0111*21-0.00089*1897+1.2178-0.076);
49    put p;
50    p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038);
51    put p;
52    run;

0.0004043077
0.0004043077

 

zeynep
Fluorite | Level 6
Thank you @FreelanceReinhard for your reply.
var3 and var4 are binary variables and in this example, their values are 0. That's why I multiplied their coefficients with 0 in the equation just like multiplying var1 and var2 coefficients with their values. Can you explain why it's incorrect? What if the values of var3 and var4 were 1, would you still calculate as p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038)?
FreelanceReinh
Jade | Level 19

@zeynep wrote:
var3 and var4 are binary variables and in this example, their values are 0. That's why I multiplied their coefficients with 0 in the equation just like multiplying var1 and var2 coefficients with their values. Can you explain why it's incorrect? What if the values of var3 and var4 were 1, would you still calculate as p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038)?

Remember that also character variables on a nominal scale can be used as categorical predictors. For those it is obvious that in the regression equation their values (such as "Caucasian", "Yes", "Divorced") must be replaced by the numeric values of the corresponding design variables (whose creation is triggered by the CLASS statement). If a predictor is listed in the CLASS statement, its design variables are always used. No exceptions are made for predictors whose original values happen to be numeric.

 

So, in the case of effect coding of VAR3 and VAR4 with the ref='1' option you would look at the design matrix (cf. table "Class Level Information" in your PROC LOGISTIC output or the generic example in the documentation: "Other Parameterizations", first table). There you would find out that if the values of VAR3 and VAR4 were 1 (i.e., equal to their respective reference levels), the corresponding design variable values in the regression equation would be −1, leading to the calculation p=logistic(-6.4624-0.0111*21-0.00089*1897-0.6089+0.038).

 

That said, a special feature of binary variables with values 0 and 1 is that they may be treated as continuous variables (by removing them from the CLASS statement), in which case handling them "just like" VAR1 and VAR2 would be correct. The results will be exactly the same as if they are used in the CLASS statement with a parameterization assigning 0 to 0 and 1 to 1 in the design variables. This is not surprising because then the design variable and the original predictor are mathematically equal.

Reeza
Super User

PROC LOGISTIC now has the CODE statement so you can have the scoring code created for you. 

Compare your code to that code is one good way to check this process.

 

And here's a post that illustrates an end to end check for the calculations:

Raw data to run the code is here: (neuralgia data set)

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples02.htm

And post showing example:

https://communities.sas.com/t5/Statistical-Procedures/How-to-determine-logistic-regression-formula-f...

zeynep
Fluorite | Level 6
Thank you @Reeza.
I have just tried the code in the post you shared and I got the same value with my manual calculation (0.000129594) both of which are not close to IP_1. Below are the values. The code inputs 0 for var3 and var4 just like I do in my manual calculation, but the pred prob gets close to IP_1 only if we input 1 just like @FreelanceReinhard did. I am not clear on why we would multiply by 1 instead of 0, when the values of the variables are 0. Any thoughts?

IP_1 myformula
0.000405819 0.000129594
Reeza
Super User

Post the Effect Coding matrix please.

zeynep
Fluorite | Level 6

Here is the class levels. Does the last column mean if the value is 0 in the data, input them as 1? If that's the case, how can I flip them?

 

Class Level Information
ClassValueDesign
Variables
var301
 10
var401
 10
Reeza
Super User
Note that your value and design code are flipped... 0 maps to 1 and 1 maps to 0.
Reeza
Super User

You've specified the reference level is 1, which means that is coded as 0.

Sounds like you want the opposite, you can just remove the reference specification.

 

proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;
zeynep
Fluorite | Level 6

Actually I wasn't using the reference specification in my original code and still 0 and 1 were flipped, but if I do (ref='0') that gives the same probabilities with IP_1. Thank you for your help!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2283 views
  • 4 likes
  • 3 in conversation