Hi,
Can anyone tell me how SAS calculates probalibility using proc logistic score statement? Why my manual probability calculation is so different from SAS's? Does SAS take all variables into calculation or only significant ones?
P=exp(F)/(1+exp(F)) = 1 / (1+exp(-F))
Intercept | 1 | -0.9724 | 0.2331 | 17.4036 | <.0001 | |
classA | 1 | 1 | 0.0150 | 0.0827 | 0.0328 | 0.8562 |
classA | 2 | 1 | -0.0694 | 0.0380 | 3.3468 | 0.0673 |
classA | 3 | 1 | -0.1681 | 0.0467 | 12.9276 | 0.0003 |
classA | 4 | 1 | 0.1498 | 0.0706 | 4.5050 | 0.0338 |
classB | 1 | 1 | -0.1760 | 0.0547 | 10.3536 | 0.0013 |
classC | 1 | 1 | 1.4962 | 0.0280 | 2863.0571 | <.0001 |
classD | 1 | 1 | -0.5437 | 0.0445 | 149.1918 | <.0001 |
classE | 1 | 1 | 0.1358 | 0.1794 | 0.5727 | 0.4492 |
classE | 2 | 1 | 0.1191 | 0.3248 | 0.1345 | 0.7138 |
classE | 3 | 1 | -0.3680 | 0.4482 | 0.6740 | 0.4117 |
Numeric1 | 1 | -0.2314 | 0.0197 | 138.3852 | <.0001 | |
Numeric2 | 1 | 0.000259 | 0.000040 | 42.9420 | <.0001 | |
Numeric3 | 1 | 2.1190 | 0.1641 | 166.7216 | <.0001 |
Thank you!!!
Fan
Without seeing your code or data, it is difficult to guess your problem.
Do you have missing values? If so, you need to handle them correctly. Are you using the full precision of the parameter estimates or just the 4 decimal place shown in the table?
The following is a brief example that uses the DATA step to score. In practice, you would use the CODE statement, the SCORE statement, or PROC PLM, as explained in the article that Reeza links to.
proc logistic data=sashelp.class;
model sex = height weight age;
output out=LogiPred(keep=P) pred=P;
run;
data DataPred;
set sashelp.class;
F = -1.19262669802445 +
-0.15172772458382 * Height +
-0.13262689324423 * Weight +
1.76465589368491 * Age;
P = 1 / (1 + exp(-F));
keep P;
run;
proc compare base=LogiPred compare=DataPred method=absolute criterion=1e-12;
run;
It uses the variables listed in the ParameterEstimates tables.
If you're doing a selection method, the last table is the significant one.
If you want code that replicates the scoring code, look at CODE statement within PROC LOGISTIC.
See the various methods for scoring data here:
http://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html
Here's a fully worked example to demonstrate the issue.
Appreciate your answer. Please correct me if I am wrong. As for SAS's calculation, it takes all the variables in the parameter estimates report, just the all the predictor variables. And as in your answers, there's still difference. So I really would like to know what makes the difference.
Thanks!
Without seeing your code or data, it is difficult to guess your problem.
Do you have missing values? If so, you need to handle them correctly. Are you using the full precision of the parameter estimates or just the 4 decimal place shown in the table?
The following is a brief example that uses the DATA step to score. In practice, you would use the CODE statement, the SCORE statement, or PROC PLM, as explained in the article that Reeza links to.
proc logistic data=sashelp.class;
model sex = height weight age;
output out=LogiPred(keep=P) pred=P;
run;
data DataPred;
set sashelp.class;
F = -1.19262669802445 +
-0.15172772458382 * Height +
-0.13262689324423 * Weight +
1.76465589368491 * Age;
P = 1 / (1 + exp(-F));
keep P;
run;
proc compare base=LogiPred compare=DataPred method=absolute criterion=1e-12;
run;
99% of the time the difference will be due to an error in the manual calculation.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.