SAS Programming

zeynep · Posted 06-24-2022 02:25 PM

Hello,

I am using proc logistic (binary logit model). I would like to see if I can get the same predicted probability IP_1 values that proc logistic provides, if I do the calculation manually using regression equation.

I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.

I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.

How can I get to IP_1 = 0.000406?

Here is an example with the values of the variables, and coefficients.

var1	var2	var3	var4	IP_1	manual calculation
21	1897	0	0	0.000406	0.000228

Estimates with effect coding

Parameter		DF	Estimate
Intercept		1	-6.4624
var3	0	1	0.6089
var4	0	1	-0.038
var2		1	-0.00089
var1		1	-0.0111

Estimates with reference coding

Parameter		DF	Estimate
Intercept		1	-7.0333
var3	0	1	1.2178
var4	0	1	-0.076
var2		1	-0.00089
var1		1	-0.0111

proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;

Thank you.

Zeynep

Reeza · Posted 06-24-2022 04:07 PM

You've specified the reference level is 1, which means that is coded as 0.

Sounds like you want the opposite, you can just remove the reference specification.

proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;

View solution in original post

FreelanceReinh · Posted 06-24-2022 02:48 PM

Hello @zeynep,

@zeynep wrote:

I tried Probability = 1 / [1 +exp (-(B0 + b1X))] and inputted the values from the “Estimate” column and the values for my variables, but the resulting probability was not the same with what I got in IP_1 column.

I tried using reference coding instead of effect coding to get the estimates, and then calculated the probability, but again it was not the same with IP_1 value.

Your formula is correct. There is the LOGISTIC function to simplify the code a bit. But either way I get a result that is close to IP_1 (the remaining rounding error could almost certainly be avoided by using more precise parameter values from an output dataset), regardless of the parameterization of VAR3 and VAR4:

47    data _null_;
48    p=logistic(-7.0333-0.0111*21-0.00089*1897+1.2178-0.076);
49    put p;
50    p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038);
51    put p;
52    run;

0.0004043077
0.0004043077

zeynep · Posted 06-24-2022 03:00 PM

Thank you @FreelanceReinhard for your reply.
var3 and var4 are binary variables and in this example, their values are 0. That's why I multiplied their coefficients with 0 in the equation just like multiplying var1 and var2 coefficients with their values. Can you explain why it's incorrect? What if the values of var3 and var4 were 1, would you still calculate as p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038)?

FreelanceReinh · Posted 06-25-2022 04:52 AM

@zeynep wrote:
var3 and var4 are binary variables and in this example, their values are 0. That's why I multiplied their coefficients with 0 in the equation just like multiplying var1 and var2 coefficients with their values. Can you explain why it's incorrect? What if the values of var3 and var4 were 1, would you still calculate as p=logistic(-6.4624-0.0111*21-0.00089*1897+0.6089-0.038)?

Remember that also character variables on a nominal scale can be used as categorical predictors. For those it is obvious that in the regression equation their values (such as "Caucasian", "Yes", "Divorced") must be replaced by the numeric values of the corresponding design variables (whose creation is triggered by the CLASS statement). If a predictor is listed in the CLASS statement, its design variables are always used. No exceptions are made for predictors whose original values happen to be numeric.

So, in the case of effect coding of VAR3 and VAR4 with the ref='1' option you would look at the design matrix (cf. table "Class Level Information" in your PROC LOGISTIC output or the generic example in the documentation: "Other Parameterizations", first table). There you would find out that if the values of VAR3 and VAR4 were 1 (i.e., equal to their respective reference levels), the corresponding design variable values in the regression equation would be −1, leading to the calculation p=logistic(-6.4624-0.0111*21-0.00089*1897-0.6089+0.038).

That said, a special feature of binary variables with values 0 and 1 is that they may be treated as continuous variables (by removing them from the CLASS statement), in which case handling them "just like" VAR1 and VAR2 would be correct. The results will be exactly the same as if they are used in the CLASS statement with a parameterization assigning 0 to 0 and 1 to 1 in the design variables. This is not surprising because then the design variable and the original predictor are mathematically equal.

Reeza · Posted 06-24-2022 03:13 PM

PROC LOGISTIC now has the CODE statement so you can have the scoring code created for you.

Compare your code to that code is one good way to check this process.

And here's a post that illustrates an end to end check for the calculations:

Raw data to run the code is here: (neuralgia data set)

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples02.htm

And post showing example:

https://communities.sas.com/t5/Statistical-Procedures/How-to-determine-logistic-regression-formula-f...

zeynep · Posted 06-24-2022 03:51 PM

Thank you @Reeza.
I have just tried the code in the post you shared and I got the same value with my manual calculation (0.000129594) both of which are not close to IP_1. Below are the values. The code inputs 0 for var3 and var4 just like I do in my manual calculation, but the pred prob gets close to IP_1 only if we input 1 just like @FreelanceReinhard did. I am not clear on why we would multiply by 1 instead of 0, when the values of the variables are 0. Any thoughts?

IP_1 myformula
0.000405819 0.000129594

Reeza · Posted 06-24-2022 03:53 PM

Post the Effect Coding matrix please.

zeynep · Posted 06-24-2022 03:58 PM

Here is the class levels. Does the last column mean if the value is 0 in the data, input them as 1? If that's the case, how can I flip them?

Class Level Information
Class	Value	Design
Variables
var3	0	1
	1	0
var4	0	1
	1	0

Reeza · Posted 06-24-2022 04:02 PM

Note that your value and design code are flipped... 0 maps to 1 and 1 maps to 0.

Reeza · Posted 06-24-2022 04:07 PM

You've specified the reference level is 1, which means that is coded as 0.

Sounds like you want the opposite, you can just remove the reference specification.

proc logistic data=mydata desc outmodel= model plots(only)=roc PLOTS(MAXPOINTS=NONE) plots=effect;
class var3 (ref='1') var4 (ref='1') / param = ref ;
model target_var = var1 var2 var3 var4 / lackfit firth ctable pprob=(0.0010, 0.0015, 0.002, 0.0025, 0.003, 0.004) maxiter=100;
output out=preds predprobs=individual;
run;

zeynep · Posted 06-24-2022 04:21 PM

Actually I wasn't using the reference specification in my original code and still 0 and 1 were flipped, but if I do (ref='0') that gives the same probabilities with IP_1. Thank you for your help!

SAS Programming

Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Re: Manually calculating logistic regression probabilities

Logistic Regression

score new data-logistic regression

A Guide to Logistic Regression in SAS

Logistic Regression Probabilities

Logistic regression- restore in permanent library

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...