Dear SAS community,
Since the lsmeans/ilink option is not supported in proc logistic when the predictor var is continuous, I tried the following estimates:
My outcome var is ordinal (1,2,3,4,5,6,7,8,9) and my predictor DM continuous. I would like to know the predicted prob for hedonic=5 and 6 at DM=22.8.
proc logistic data=one desc;
model hedonic= DM/link=clogit;
estimate "Pr prob hedonic=5 at DM=22.8" intercept 1 DM 22.8/ilink category='5';
estimate "Pr prob hedonic=6 at DM=22.8" intercept 1 DM 22.8/ilink category='6';
run;
I would greatly appreciate if you could let me know if this looks ok to you.
Thanks a lot!
Since you want predicted probabilities for individual levels of your response from this ordinal model, the clearest way to get them is to create a one observation data set to be scored with DM at the desired level and then use the OUTPUT statement with the PREDPROBS=INDIVIDUAL option. This option produces the predicted probabilities of the individual response levels. To illustrate, use the DocVisit data set in the example titled "Partial Proportional Odds Model" in the PROC LOGISTIC documentation. To simplify the example, the following statements combine all the higher response levels into level 3, so that the new response variable, DV, has levels 0, 1, 2, or 3.
data dv; set docvisit;
dv=dvisits; if dvisits>2 then dv=3;
run;
and then create a one observation data set with the predictor, INCOME, set to the desired level (let's use 0.25) and the response set to missing and add it to the original data so that this observation does not get used when fitting the model. The SCORE variable is created so that this observation can be singled out for saving by the WHERE clause in the PROC LOGISTIC OUTPUT statement.
data score; income=0.25; dv=.; score=1; run;
data dv2; set score dv; run;
Now, fit the ordinal model using INCOME as the predictor. The EFFECTPLOT gives you a visual image of how the individual response level probabilities change with INCOME. The OUTPUT statement produces the individual response level probabilities and the probabilities cumulated over the lower response levels (0, up to 1, up to 2, and up to 3). The ESTIMATE statement with the ILINK option can only produce the cumulative probabilities. The CATEGORY=JOINT option gives each of the cumulative probabilities in a single table.
proc logistic data=dv2;
model dv=income;
effectplot / individual;
output out=out(where=(score=1)) predprobs=(cumulative individual);
estimate 'P(dv=1 @ .25)' intercept 1 income .25 / ilink category=joint;
run;
These statements print the one observation at INCOME=0.25 and its cumulative (beginning with CP_) and individual level (beginning with IP_) predicted probabilities.
proc print data=out;
id income; var ip: cp:;
run;
Notice that the cumulative predicted probabilities are the same as those from the ESTIMATE statement. The differences between successive cumulative predicted probabilities are the individual predicted probabilities.
Since you want predicted probabilities for individual levels of your response from this ordinal model, the clearest way to get them is to create a one observation data set to be scored with DM at the desired level and then use the OUTPUT statement with the PREDPROBS=INDIVIDUAL option. This option produces the predicted probabilities of the individual response levels. To illustrate, use the DocVisit data set in the example titled "Partial Proportional Odds Model" in the PROC LOGISTIC documentation. To simplify the example, the following statements combine all the higher response levels into level 3, so that the new response variable, DV, has levels 0, 1, 2, or 3.
data dv; set docvisit;
dv=dvisits; if dvisits>2 then dv=3;
run;
and then create a one observation data set with the predictor, INCOME, set to the desired level (let's use 0.25) and the response set to missing and add it to the original data so that this observation does not get used when fitting the model. The SCORE variable is created so that this observation can be singled out for saving by the WHERE clause in the PROC LOGISTIC OUTPUT statement.
data score; income=0.25; dv=.; score=1; run;
data dv2; set score dv; run;
Now, fit the ordinal model using INCOME as the predictor. The EFFECTPLOT gives you a visual image of how the individual response level probabilities change with INCOME. The OUTPUT statement produces the individual response level probabilities and the probabilities cumulated over the lower response levels (0, up to 1, up to 2, and up to 3). The ESTIMATE statement with the ILINK option can only produce the cumulative probabilities. The CATEGORY=JOINT option gives each of the cumulative probabilities in a single table.
proc logistic data=dv2;
model dv=income;
effectplot / individual;
output out=out(where=(score=1)) predprobs=(cumulative individual);
estimate 'P(dv=1 @ .25)' intercept 1 income .25 / ilink category=joint;
run;
These statements print the one observation at INCOME=0.25 and its cumulative (beginning with CP_) and individual level (beginning with IP_) predicted probabilities.
proc print data=out;
id income; var ip: cp:;
run;
Notice that the cumulative predicted probabilities are the same as those from the ESTIMATE statement. The differences between successive cumulative predicted probabilities are the individual predicted probabilities.
Great, I got the same results using this approach, thank you so much StatDave!
Question: when using proc logistic with a categorical outcome variable with more than two levels, do I have to include it in the class statement and specify a baseline category using (ref=)?
Thanks!
Ok, good to know. Thank you very much StatDave!
Note that "etc." in @StatDave 's response does not, so far as I can tell, include GLIMMIX. For some reason, a multinomial response variable must be included in the CLASS statement for things to not error out.
SteveDenham
Thank you for letting me know Steve, that's good to know, I hope I can remember that when using Glimmix.
I have a question regarding regression analysis. I know that the higher the r-square value the better, but is there a minimum r-square value in order to make meaningful predictions?
Thank you Steve!
r-square value of model is just a square of pearson correlation coefficient between Y variable and Predicted Y variable.
In other words, r in r-square is a pearson correlation coefficient between Y variable and Predicted Y variable.
proc glm data=sashelp.class;
model weight=height age;
output out=want p=pred;
quit;
You can use PROC CORR to do correlation coefficient significant test .
proc corr data=want;
var weight pred;
run;
Here 0.87915*0.87915= 0.7729 (the same as r-square of model)
You can check the p-value (marked as yellow) to see if r-square is significant.
Nope. Also could apply to LOGISTIC Model:
data have;
set sashelp.heart;
y=ifn(status='Dead',0,1);
run;
proc logistic data=have;
model y=height weight/rsquare;
output out=want p=pred;
run;
proc corr data=want;
var y pred;
run;
Here 0.14804*0.14804=0.0219 (same as proc logistic)
That's great, thank you very much Ksharp!
I tried this code (omitting the first part) when my outcome var is binary and I got pretty similar values for r-square. However, when I tried it for an ordinal outcome var I got pretty different r-square results (0.07172*0.07172=0.0051 vs 0.1022). Am I doing something wrong or does this only work for binary outcome variables?
This is the code I'm using :
proc logistic data=one desc;
model hedonic=DM/link=clogit rsquare;
output out=want3 p=pred;
run;
proc corr data=want3;
var hedonic pred;
run;
These are the results I get:
Response Profile | ||
---|---|---|
Ordered Value |
hedonic | Total Frequency |
1 | 9 | 17 |
2 | 8 | 93 |
3 | 7 | 99 |
4 | 6 | 103 |
5 | 5 | 86 |
6 | 4 | 100 |
7 | 3 | 50 |
8 | 2 | 31 |
9 | 1 | 4 |
Probabilities modeled are cumulated over the lower Ordered Values. |
R-Square | 0.1022 | Max-rescaled R-Square | 0.1042 |
---|
Testing Global Null Hypothesis: BETA=0 | |||
---|---|---|---|
Test | Chi-Square | DF | Pr > ChiSq |
Likelihood Ratio | 62.8777 | 1 | <.0001 |
Score | 60.9165 | 1 | <.0001 |
Wald | 60.5840 | 1 | <.0001 |
Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Number of Observations |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|
hedonic | pred | |||||||||
|
|
|
||||||||
|
|
|
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.