I use the following code to get regression coefficient
step 1
proc logistic data=d1 outmodel=outp_d1;
where first=1;
class examlevel sex/param=ref;
model pass (event="1")=capte sex time passrate/expb;
run;
Then I applied the regression coefficients got from step 1 to the equation and calculate probability for each examinee, as code below
step 2
data pre12;
set pre11;
if first=1 then logit=-3.9438+0.3341*examlevel-0.1650*sex-0.00713*time+7.3763*passrate;
run;
data pre12;
set pre12;
odds=exp(logit);
run;
data pre12;
set pre12;
prob_pred=odds/(1+odds);
run;
The other way to get probability is by using the results from step 1 as code below
proc logistic inmodel=outp_d1;
score clm data=pre11 out=prob_first;
where first=1 ;
run;
On examinee level, I got (a little bit) higher probability by using plug in coefficients in the equation than using proc logistic inmodel statement. When I aggregate the probability on group level over thousands of examinees, the difference between these two method of calculate probability are close to 5%, which is to big to be ignored. Any help would be much appreciated. Thanks!
Thank you very much for the help.
I tried prob_pred=logistic(logit) as you suggested to calculated the probability, I got the same probability as the calculation I used (step 2 in my original post) for each examinee.
My step 1 is for purpose of getting regression coefficients. Then I want to apply those coefficient to a dataset "pre11". I manually plugged those coefficients and generated dataset "pre12" (as in step 2). Another method of applying those coefficients is using "proc logistic inmodel=" procedure (the bottom part of my original post). For each examinee, I got higher probability by manually plugging in coefficients than by using the proc logist inmodel procedure. The probability difference on group level is close to 5%. I have to find out where did the difference of probability come from between those two methods. I am wondering do you have any other suggestions for me to try? Thanks again!
The values you see in SAS tables are rounded values. So in Step 1, the value you typed in as -0.1650 might actually be -0.16495 or -0.165049. That is the most likely cause of your 5% error.
You can use the statement
ODS OUTPUT ParameterEstimates=OutP;
to write the parameter estimates to a SAS data set. You can then use
proc print data=outp;
format Estimate bestd16.;
run;
to see more precision for the estimates.
Thanks much for your suggestion!
I tried the statements you suggested, but I can't see more decimal places in the output file after running proc print. My "outp" file looks like below, so I plugged the highlighted numbers in the equation. I randomly selected a few examinees, I noticed the probability for each examinee calculated by plugging in the coefficients is higher than using the "proc logistic inmodel=".
Any other suggestions for me to try?
The picture you post is not the contents of the OUTP data set. It looks like the outp_d1 data set.
Copy/Paste the values of the ParameterEstimate table as text (not as an image). It will look something like this:
Obs | Variable | ClassVal0 | DF | Estimate | StdErr | WaldChiSq | ProbChiSq | ExpEst | _ESTTYPE_ |
---|---|---|---|---|---|---|---|---|---|
1 | Intercept | 1 | 0.847258 | 0.6901 | 1.5075 | 0.2195 | 2.333 | MLE | |
2 | Sex | F | 1 | -0.624121 | 0.9624 | 0.4206 | 0.5167 | 0.536 | MLE |
Thank you for replying my question!
In my original post, as in the picture below (sorry for having to block the name of variables due to confidentiality), I copied and pasted the "Estimates" into the equation to calculate probability for each examinee, but got higher probability for each examinee than using "proc logist inmodel=" procedure to calculate probability. I hope I made my question clearer.
I'll try one more time. Please do the following:
1. Run the following code:
proc logistic data=d1 outmodel=outp_d1;
where first=1;
class examlevel sex/param=ref;
model pass (event="1")=capte sex time passrate/expb;
ods output ParameterEstimates=OutP;
run;
proc print data=outp;
format Estimate bestd16.;
run;
2. Select the results of PROC PRINT. Copy them into the buffer. Paste them into your response. We are not interested in a screenshot image.
I tried as you suggested. The probabilities are still a little higher by coping and pasting the coefficients than using the proc logistic inmodel procedure. Is there anything we can do about proc logistic inmodel procedure? Or is there anything wrong of my proc logistic inmodel as below? Thank you!!
proc logistic inmodel=outp_d1;
score clm data=pre11 out=prob_first;
where first=1 ;
run
Good luck solving your problem.
Here's a way to show that the estimates are the same, any differences are typically from numerical precision issues.
Thanks much for providing the information!
My sas code is as below
proc logistic data=d1 outmodel=outp_d1;
where first=1;
class sex (ref='0')/param=ref;
model pass (event="1")= sex time passrate/expb;
run;
in my code above, male is coded as 0. By using the code above, I got the following output, what does 1 (in bold) following sex mean? Does it mean 0.3299 is the parameter estimate for female? My results is a little bit counter intuitive. I'd like to check with you expert. Thank you for taking time to answer my question.
Analysis of Maximum Likelihood Estimates |
|||||||
Parameter |
|
DF |
Estimate |
Standard |
Wald |
Pr > ChiSq |
Exp(Est) |
Intercept |
|
1 |
-3.7746 |
0.1433 |
693.8620 |
<.0001 |
0.023 |
sex |
1 |
1 |
0.3299 |
0.0439 |
56.5428 |
<.0001 |
1.391 |
time |
|
1 |
-0.00713 |
0.000960 |
55.2062 |
<.0001 |
0.993 |
pass |
|
1 |
7.3763 |
0.2057 |
1285.7808 |
<.0001 |
1597.655 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.