Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How to manually compute via excel the predicted probability in binary ...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 08-12-2018 09:01 AM
(1991 views)

Hi Guys!

Hope someone could help me. Am performing binary logistic regression in SAS and am trying to figure out if it is possible to manually compute (via Excel) for the predicted probabilities (individual, cross validated, and cumulative) generated by SAS? If yes, what is the formula?

I am using the formula exp(bo+(b1*x1)+...+(bn*xn))/(1+exp(bo+(b1*x1)+...+(bn*xn))) but the values don't match with the predicted probabilities.

I've used sample from the SAS documentation and added the predicted probabilities (predprobs) in output.

Data Neuralgia;

input Treatment $ Sex $ Age Duration Pain $ @@;

datalines;

P F 68 1 No B M 74 16 No P F 67 30 No

P M 66 26 Yes B F 67 28 No B F 77 16 No

A F 71 12 No B F 72 50 No B F 76 9 Yes

A M 71 17 Yes A F 63 27 No A F 69 18 Yes

B F 66 12 No A M 62 42 No P F 64 1 Yes

A F 64 17 No P M 74 4 No A F 72 25 No

P M 70 1 Yes B M 66 19 No B M 59 29 No

A F 64 30 No A M 70 28 No A M 69 1 No

B F 78 1 No P M 83 1 Yes B F 69 42 No

B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes

A M 70 12 No A F 69 12 No B F 65 14 No

B M 70 1 No B M 67 23 No A M 76 25 Yes

P M 78 12 Yes B M 77 1 Yes B F 69 24 No

P M 66 4 Yes P F 65 29 No P M 60 26 Yes

A M 78 15 Yes B M 75 21 Yes A F 67 11 No

P F 72 27 No P F 70 13 Yes A M 75 6 Yes

B F 65 7 No P F 68 27 Yes P M 68 11 Yes

P M 67 17 Yes B M 70 22 No A M 65 15 No

P F 67 1 Yes A M 67 10 No P F 72 11 Yes

A F 74 1 No B M 80 21 Yes A F 69 3 No

;

run;

proc logistic data=Neuralgia;

class Treatment Sex;

model Pain= Treatment Sex Treatment*Sex Age Duration / selection=backward expb;

run;

proc logistic data=Neuralgia;

class Treatment Sex;

model Pain= Treatment Sex Age/expb ctable;

output predprobs=(INDIVIDUAL CUMULATIVE CROSSVALIDATE);

run;

proc print data=data1;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @Cindy789 and welcome to the SAS Support Communities!

I haven't opened your Excel table (I don't have Excel installed on my SAS workstation), only the preview. But it seems that you calculated the predicted probabilities as if *reference cell coding* had been used as the parameterization method for the classification variables. However, the default is *effect coding* (see documentation of the CLASS statement, option PARAM=). I think the easiest correction will be to switch to reference cell coding:

class Treatment Sex / param=ref;

This means in your case, more explicitly:

`class Treatment(ref='P') Sex(ref='M') / param=ref;`

If you rerun the PROC LOGISTIC step with this modification (either of the above two versions), you will notice the changes in sections "Class Level Information" and "Analysis of Maximum Likelihood Estimates" of the output. Then you can leave the Excel formulas unchanged and just use the new estimates for Intercept etc. (no change for Age, of course). Otherwise you'd have to take into account the design variable values of −1 for Treatment='P' and Sex='M' (see table "Class Level Information") in your Excel formulas.

The remaining discrepancies (such as 0.0001, 0.0006 etc. for obs. 6, 7, ...) will be due to rounding errors. (Not surprisingly, the only substantial differences had occurred where Treatment='P' or Sex='M'.) You can reduce the rounding errors by using values with more decimals for the estimates. Just request an ODS output dataset:

`ods output ParameterEstimates=est;`

(before or in the PROC LOGISTIC step) and then print it with a suitable format in order to obtain those more precise values:

```
proc print data=est;
format estimate best18.;
run;
```

As to the "cumulative" probabilities: These are redundant for *binary* logistic regression (given the individual probabilities): As you've surely noticed, CP_No=IP_No and CP_Yes=1 in your dataset DATA1.

I don't think there is an easy way to calculate the cross-validated predicted probabilities in Excel.

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @Cindy789 and welcome to the SAS Support Communities!

I haven't opened your Excel table (I don't have Excel installed on my SAS workstation), only the preview. But it seems that you calculated the predicted probabilities as if *reference cell coding* had been used as the parameterization method for the classification variables. However, the default is *effect coding* (see documentation of the CLASS statement, option PARAM=). I think the easiest correction will be to switch to reference cell coding:

class Treatment Sex / param=ref;

This means in your case, more explicitly:

`class Treatment(ref='P') Sex(ref='M') / param=ref;`

If you rerun the PROC LOGISTIC step with this modification (either of the above two versions), you will notice the changes in sections "Class Level Information" and "Analysis of Maximum Likelihood Estimates" of the output. Then you can leave the Excel formulas unchanged and just use the new estimates for Intercept etc. (no change for Age, of course). Otherwise you'd have to take into account the design variable values of −1 for Treatment='P' and Sex='M' (see table "Class Level Information") in your Excel formulas.

The remaining discrepancies (such as 0.0001, 0.0006 etc. for obs. 6, 7, ...) will be due to rounding errors. (Not surprisingly, the only substantial differences had occurred where Treatment='P' or Sex='M'.) You can reduce the rounding errors by using values with more decimals for the estimates. Just request an ODS output dataset:

`ods output ParameterEstimates=est;`

(before or in the PROC LOGISTIC step) and then print it with a suitable format in order to obtain those more precise values:

```
proc print data=est;
format estimate best18.;
run;
```

As to the "cumulative" probabilities: These are redundant for *binary* logistic regression (given the individual probabilities): As you've surely noticed, CP_No=IP_No and CP_Yes=1 in your dataset DATA1.

I don't think there is an easy way to calculate the cross-validated predicted probabilities in Excel.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You forgot the PARAM=REF portion on the CLASS statement. Check the design matrix in the output, it's using a GLM method instead of referential coding. Once thats fixed it matches pretty closely (0.0000X) usually.

proc logistic data=Neuralgia; class Treatment Sex /PARAM=REF; model Pain= Treatment Sex Age/expb ctable; output predprobs=(INDIVIDUAL CUMULATIVE CROSSVALIDATE); run;

But, a blast from the past (5 years ago) I did work this out with someone else, step by step so hopefully this thread can help you out if you have further issues.

PS. Thanks for posting a fully worked example, makes it much easier to verify/answer your question.

EDIT: Didn't see @FreelanceReinh solution for some reason before I posted this, his response is more detailed than mine and should be marked correct.

@Cindy789 wrote:

Hi Guys!

Hope someone could help me. Am performing binary logistic regression in SAS and am trying to figure out if it is possible to manually compute (via Excel) for the predicted probabilities (individual, cross validated, and cumulative) generated by SAS? If yes, what is the formula?

I am using the formula exp(bo+(b1*x1)+...+(bn*xn))/(1+exp(bo+(b1*x1)+...+(bn*xn))) but the values don't match with the predicted probabilities.

I've used sample from the SAS documentation and added the predicted probabilities (predprobs) in output.

Data Neuralgia;

@input Treatment $ Sex $ Age Duration Pain $ @@;

datalines;

P F 68 1 No B M 74 16 No P F 67 30 No

P M 66 26 Yes B F 67 28 No B F 77 16 No

A F 71 12 No B F 72 50 No B F 76 9 Yes

A M 71 17 Yes A F 63 27 No A F 69 18 Yes

B F 66 12 No A M 62 42 No P F 64 1 Yes

A F 64 17 No P M 74 4 No A F 72 25 No

P M 70 1 Yes B M 66 19 No B M 59 29 No

A F 64 30 No A M 70 28 No A M 69 1 No

B F 78 1 No P M 83 1 Yes B F 69 42 No

B M 75 30 Yes P M 77 29 Yes P F 79 20 Yes

A M 70 12 No A F 69 12 No B F 65 14 No

B M 70 1 No B M 67 23 No A M 76 25 Yes

P M 78 12 Yes B M 77 1 Yes B F 69 24 No

P M 66 4 Yes P F 65 29 No P M 60 26 Yes

A M 78 15 Yes B M 75 21 Yes A F 67 11 No

P F 72 27 No P F 70 13 Yes A M 75 6 Yes

B F 65 7 No P F 68 27 Yes P M 68 11 Yes

P M 67 17 Yes B M 70 22 No A M 65 15 No

P F 67 1 Yes A M 67 10 No P F 72 11 Yes

A F 74 1 No B M 80 21 Yes A F 69 3 No

;

run;proc logistic data=Neuralgia;

class Treatment Sex;

model Pain= Treatment Sex Treatment*Sex Age Duration / selection=backward expb;

run;proc logistic data=Neuralgia;

class Treatment Sex;

model Pain= Treatment Sex Age/expb ctable;

output predprobs=(INDIVIDUAL CUMULATIVE CROSSVALIDATE);

run;proc print data=data1;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.