BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
palolix
Pyrite | Level 9

Dear SAS community,

 

Since the lsmeans/ilink option is not supported in proc logistic when the predictor var is continuous, I tried the following estimates:

 

My outcome var is ordinal (1,2,3,4,5,6,7,8,9) and my predictor DM continuous. I would like to know the predicted prob for hedonic=5 and 6 at DM=22.8.

 

proc logistic data=one desc;
model hedonic= DM/link=clogit;
estimate "Pr prob hedonic=5 at DM=22.8" intercept 1 DM 22.8/ilink category='5';
estimate "Pr prob hedonic=6 at DM=22.8" intercept 1 DM 22.8/ilink category='6';
run;

 

I would greatly appreciate if you could let me know if this looks ok to you.

 

Thanks a lot!

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Since you want predicted probabilities for individual levels of your response from this ordinal model, the clearest way to get them is to create a one observation data set to be scored with DM at the desired level and then use the OUTPUT statement with the PREDPROBS=INDIVIDUAL option. This option produces the predicted probabilities of the individual response levels. To illustrate, use the DocVisit data set in the example titled "Partial Proportional Odds Model" in the PROC LOGISTIC documentation. To simplify the example, the following statements combine all the higher response levels into level 3, so that the new response variable, DV, has levels 0, 1, 2, or 3. 

 

data dv; set docvisit; 
  dv=dvisits; if dvisits>2 then dv=3; 
  run;

and then create a one observation data set with the predictor, INCOME, set to the desired level (let's use 0.25) and the response set to missing and add it to the original data so that this observation does not get used when fitting the model. The SCORE variable is created so that this observation can be singled out for saving by the WHERE clause in the PROC LOGISTIC OUTPUT statement.

data score; income=0.25; dv=.; score=1; run;
data dv2; set score dv; run;

Now, fit the ordinal model using INCOME as the predictor. The EFFECTPLOT gives you a visual image of how the individual response level probabilities change with INCOME. The OUTPUT statement produces the individual response level probabilities and the probabilities cumulated over the lower response levels (0, up to 1, up to 2, and up to 3). The ESTIMATE statement with the ILINK option can only produce the cumulative probabilities. The CATEGORY=JOINT option gives each of the cumulative probabilities in a single table. 

proc logistic data=dv2; 
model dv=income; 
effectplot / individual;
output out=out(where=(score=1)) predprobs=(cumulative individual); 
estimate 'P(dv=1 @ .25)' intercept 1 income .25 / ilink category=joint;
run;

These statements print the one observation at INCOME=0.25 and its cumulative (beginning with CP_) and individual level (beginning with IP_) predicted probabilities.

proc print data=out;
id income; var ip: cp:; 
run;

Notice that the cumulative predicted probabilities are the same as those from the ESTIMATE statement. The differences between successive cumulative predicted probabilities are the individual predicted probabilities.

View solution in original post

20 REPLIES 20
StatDave
SAS Super FREQ

Since you want predicted probabilities for individual levels of your response from this ordinal model, the clearest way to get them is to create a one observation data set to be scored with DM at the desired level and then use the OUTPUT statement with the PREDPROBS=INDIVIDUAL option. This option produces the predicted probabilities of the individual response levels. To illustrate, use the DocVisit data set in the example titled "Partial Proportional Odds Model" in the PROC LOGISTIC documentation. To simplify the example, the following statements combine all the higher response levels into level 3, so that the new response variable, DV, has levels 0, 1, 2, or 3. 

 

data dv; set docvisit; 
  dv=dvisits; if dvisits>2 then dv=3; 
  run;

and then create a one observation data set with the predictor, INCOME, set to the desired level (let's use 0.25) and the response set to missing and add it to the original data so that this observation does not get used when fitting the model. The SCORE variable is created so that this observation can be singled out for saving by the WHERE clause in the PROC LOGISTIC OUTPUT statement.

data score; income=0.25; dv=.; score=1; run;
data dv2; set score dv; run;

Now, fit the ordinal model using INCOME as the predictor. The EFFECTPLOT gives you a visual image of how the individual response level probabilities change with INCOME. The OUTPUT statement produces the individual response level probabilities and the probabilities cumulated over the lower response levels (0, up to 1, up to 2, and up to 3). The ESTIMATE statement with the ILINK option can only produce the cumulative probabilities. The CATEGORY=JOINT option gives each of the cumulative probabilities in a single table. 

proc logistic data=dv2; 
model dv=income; 
effectplot / individual;
output out=out(where=(score=1)) predprobs=(cumulative individual); 
estimate 'P(dv=1 @ .25)' intercept 1 income .25 / ilink category=joint;
run;

These statements print the one observation at INCOME=0.25 and its cumulative (beginning with CP_) and individual level (beginning with IP_) predicted probabilities.

proc print data=out;
id income; var ip: cp:; 
run;

Notice that the cumulative predicted probabilities are the same as those from the ESTIMATE statement. The differences between successive cumulative predicted probabilities are the individual predicted probabilities.

palolix
Pyrite | Level 9

Great, I got the same results using this approach, thank you so much StatDave!

 

Question: when using proc logistic with a categorical outcome variable with more than two levels, do I have to include it in the class statement  and specify a baseline category using (ref=)?

 

Thanks!

StatDave
SAS Super FREQ
No. When modeling a categorical response (binary, ordinal, or nominal) in any procedure (LOGISTIC, GENMOD, HPGENSELECT, etc.), it is best to not include it in the CLASS statement. If you want to set a reference level (for a binary or nominal responses) or to change the direction of an ordinal response, use the options following the response in the MODEL. For example: model y(ref='1') = .... / link=glogit;
palolix
Pyrite | Level 9

Ok, good to know. Thank you very much StatDave!

SteveDenham
Jade | Level 19

Note that "etc." in @StatDave 's response does not, so far as I can tell, include GLIMMIX. For some reason, a multinomial response variable must be included in the CLASS statement for things to not error out.

 

SteveDenham

palolix
Pyrite | Level 9

Thank you for letting me know Steve, that's good to know, I hope I can remember that when using Glimmix. 

I have a question regarding regression analysis. I know that the higher the r-square value the better, but is there a minimum r-square value in order to make meaningful predictions?

 

Thank you Steve!

Ksharp
Super User

 r-square value of model is just a square of pearson correlation coefficient between Y variable and Predicted Y variable.

In other words,   r in r-square is  a pearson correlation coefficient between Y variable and Predicted Y variable.

proc glm data=sashelp.class;
model weight=height age;
output out=want p=pred;
quit;

Ksharp_0-1732068394303.png

 

 

You can use PROC CORR to do correlation coefficient significant test .

proc corr data=want;
var weight pred;
run;

Ksharp_2-1732069306737.png

 

Here 0.87915*0.87915= 0.7729  (the same as r-square of model)

You can check the p-value (marked as yellow) to see if  r-square is significant.

palolix
Pyrite | Level 9
Thank you so much for your suggestion Ksharp. To use Pearson corr between two variables, neither of the variables can be the response or outcome var?
Ksharp
Super User
What do you mean by that ?
these two variable: one is original response variable Y,another is Y hat(the predicted value of Y).
palolix
Pyrite | Level 9
Ok I will try that with y and pred y, but this applies only to continuous variables right?
Ksharp
Super User

Nope. Also could apply to LOGISTIC Model:

data have;
 set sashelp.heart;
 y=ifn(status='Dead',0,1);
run;

proc logistic data=have;
model y=height weight/rsquare;
output out=want p=pred;
run;
proc corr data=want;
var y pred;
run;

Ksharp_0-1732169828016.png

 

Ksharp_1-1732169858169.png

 

Here 0.14804*0.14804=0.0219 (same as proc logistic)

palolix
Pyrite | Level 9

That's great, thank you very much Ksharp! 

I tried this code (omitting the first part) when my outcome var is binary and I got pretty similar values for r-square. However, when I tried it for an ordinal outcome var I got pretty different r-square results (0.07172*0.07172=0.0051 vs 0.1022). Am I doing something wrong or does this only work for binary outcome variables?

This is the code I'm using :

proc logistic data=one desc;
model hedonic=DM/link=clogit rsquare;
output out=want3 p=pred;
run;
proc corr data=want3;
var hedonic pred;
run;

 

These are the results I get:

 
Response Profile
Ordered
Value
hedonic Total
Frequency
1 9 17
2 8 93
3 7 99
4 6 103
5 5 86
6 4 100
7 3 50
8 2 31
9 1 4

 

Probabilities modeled are cumulated over the lower Ordered Values.

 


R-Square 0.1022 Max-rescaled R-Square 0.1042

Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 62.8777 1 <.0001
Score 60.9165 1 <.0001
Wald 60.5840 1 <.0001

The CORR Procedure

 

 

Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
  hedonic pred
hedonic
 
1.00000
 
4680
0.07172
<.0001
4664
pred
Estimated Probability
0.07172
<.0001
4664
1.00000
 
4664
Ksharp
Super User
If your Y variable is ordinal with multiple values like : 1,2,3,4
the PROC LOGISTIC would lead you to THREE models:
1 v.s 2,3,4
1,2 v.s 3,4
1,2,3 v.s 4

Therefore, my idea is NOT suited for this scenario I think.
palolix
Pyrite | Level 9
Oh ok, but at least it helped me a lot with proc glm and when having a binary outcome in proc logistic. Thanks a lot Ksharp!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 1637 views
  • 14 likes
  • 5 in conversation