Solved: Re: estimates in proc logistic when predictor is a continuous var

palolix · Posted 11-15-2024 09:10 PM

Dear SAS community,

Since the lsmeans/ilink option is not supported in proc logistic when the predictor var is continuous, I tried the following estimates:

My outcome var is ordinal (1,2,3,4,5,6,7,8,9) and my predictor DM continuous. I would like to know the predicted prob for hedonic=5 and 6 at DM=22.8.

proc logistic data=one desc;
model hedonic= DM/link=clogit;
estimate "Pr prob hedonic=5 at DM=22.8" intercept 1 DM 22.8/ilink category='5';
estimate "Pr prob hedonic=6 at DM=22.8" intercept 1 DM 22.8/ilink category='6';
run;

I would greatly appreciate if you could let me know if this looks ok to you.

Thanks a lot!

StatDave · Posted 11-15-2024 10:32 PM

Since you want predicted probabilities for individual levels of your response from this ordinal model, the clearest way to get them is to create a one observation data set to be scored with DM at the desired level and then use the OUTPUT statement with the PREDPROBS=INDIVIDUAL option. This option produces the predicted probabilities of the individual response levels. To illustrate, use the DocVisit data set in the example titled "Partial Proportional Odds Model" in the PROC LOGISTIC documentation. To simplify the example, the following statements combine all the higher response levels into level 3, so that the new response variable, DV, has levels 0, 1, 2, or 3.

data dv; set docvisit; 
  dv=dvisits; if dvisits>2 then dv=3; 
  run;

and then create a one observation data set with the predictor, INCOME, set to the desired level (let's use 0.25) and the response set to missing and add it to the original data so that this observation does not get used when fitting the model. The SCORE variable is created so that this observation can be singled out for saving by the WHERE clause in the PROC LOGISTIC OUTPUT statement.

data score; income=0.25; dv=.; score=1; run;
data dv2; set score dv; run;

Now, fit the ordinal model using INCOME as the predictor. The EFFECTPLOT gives you a visual image of how the individual response level probabilities change with INCOME. The OUTPUT statement produces the individual response level probabilities and the probabilities cumulated over the lower response levels (0, up to 1, up to 2, and up to 3). The ESTIMATE statement with the ILINK option can only produce the cumulative probabilities. The CATEGORY=JOINT option gives each of the cumulative probabilities in a single table.

proc logistic data=dv2; 
model dv=income; 
effectplot / individual;
output out=out(where=(score=1)) predprobs=(cumulative individual); 
estimate 'P(dv=1 @ .25)' intercept 1 income .25 / ilink category=joint;
run;

These statements print the one observation at INCOME=0.25 and its cumulative (beginning with CP_) and individual level (beginning with IP_) predicted probabilities.

proc print data=out;
id income; var ip: cp:; 
run;

Notice that the cumulative predicted probabilities are the same as those from the ESTIMATE statement. The differences between successive cumulative predicted probabilities are the individual predicted probabilities.

View solution in original post

StatDave · Posted 11-15-2024 10:32 PM

Since you want predicted probabilities for individual levels of your response from this ordinal model, the clearest way to get them is to create a one observation data set to be scored with DM at the desired level and then use the OUTPUT statement with the PREDPROBS=INDIVIDUAL option. This option produces the predicted probabilities of the individual response levels. To illustrate, use the DocVisit data set in the example titled "Partial Proportional Odds Model" in the PROC LOGISTIC documentation. To simplify the example, the following statements combine all the higher response levels into level 3, so that the new response variable, DV, has levels 0, 1, 2, or 3.

data dv; set docvisit; 
  dv=dvisits; if dvisits>2 then dv=3; 
  run;

and then create a one observation data set with the predictor, INCOME, set to the desired level (let's use 0.25) and the response set to missing and add it to the original data so that this observation does not get used when fitting the model. The SCORE variable is created so that this observation can be singled out for saving by the WHERE clause in the PROC LOGISTIC OUTPUT statement.

data score; income=0.25; dv=.; score=1; run;
data dv2; set score dv; run;

Now, fit the ordinal model using INCOME as the predictor. The EFFECTPLOT gives you a visual image of how the individual response level probabilities change with INCOME. The OUTPUT statement produces the individual response level probabilities and the probabilities cumulated over the lower response levels (0, up to 1, up to 2, and up to 3). The ESTIMATE statement with the ILINK option can only produce the cumulative probabilities. The CATEGORY=JOINT option gives each of the cumulative probabilities in a single table.

proc logistic data=dv2; 
model dv=income; 
effectplot / individual;
output out=out(where=(score=1)) predprobs=(cumulative individual); 
estimate 'P(dv=1 @ .25)' intercept 1 income .25 / ilink category=joint;
run;

These statements print the one observation at INCOME=0.25 and its cumulative (beginning with CP_) and individual level (beginning with IP_) predicted probabilities.

proc print data=out;
id income; var ip: cp:; 
run;

Notice that the cumulative predicted probabilities are the same as those from the ESTIMATE statement. The differences between successive cumulative predicted probabilities are the individual predicted probabilities.

palolix · Posted 11-16-2024 05:07 PM

Great, I got the same results using this approach, thank you so much StatDave!

Question: when using proc logistic with a categorical outcome variable with more than two levels, do I have to include it in the class statement and specify a baseline category using (ref=)?

Thanks!

StatDave · Posted 11-16-2024 05:15 PM

No. When modeling a categorical response (binary, ordinal, or nominal) in any procedure (LOGISTIC, GENMOD, HPGENSELECT, etc.), it is best to not include it in the CLASS statement. If you want to set a reference level (for a binary or nominal responses) or to change the direction of an ordinal response, use the options following the response in the MODEL. For example: model y(ref='1') = .... / link=glogit;

palolix · Posted 11-16-2024 05:30 PM

Ok, good to know. Thank you very much StatDave!

SteveDenham · Posted 11-19-2024 12:37 PM

Note that "etc." in @StatDave 's response does not, so far as I can tell, include GLIMMIX. For some reason, a multinomial response variable must be included in the CLASS statement for things to not error out.

SteveDenham

palolix · Posted 11-19-2024 02:05 PM

Thank you for letting me know Steve, that's good to know, I hope I can remember that when using Glimmix.

I have a question regarding regression analysis. I know that the higher the r-square value the better, but is there a minimum r-square value in order to make meaningful predictions?

Thank you Steve!

Ksharp · Posted 11-19-2024 09:23 PM

r-square value of model is just a square of pearson correlation coefficient between Y variable and Predicted Y variable.

In other words, r in r-square is a pearson correlation coefficient between Y variable and Predicted Y variable.

proc glm data=sashelp.class;
model weight=height age;
output out=want p=pred;
quit;

You can use PROC CORR to do correlation coefficient significant test .

proc corr data=want;
var weight pred;
run;

Here 0.87915*0.87915= 0.7729 (the same as r-square of model)

You can check the p-value (marked as yellow) to see if r-square is significant.

palolix · Posted 11-20-2024 07:16 PM

Thank you so much for your suggestion Ksharp. To use Pearson corr between two variables, neither of the variables can be the response or outcome var?

Ksharp · Posted 11-20-2024 07:41 PM

What do you mean by that ?
these two variable: one is original response variable Y,another is Y hat(the predicted value of Y).

palolix · Posted 11-20-2024 11:05 PM

Ok I will try that with y and pred y, but this applies only to continuous variables right?

Ksharp · Posted 11-21-2024 01:19 AM

Nope. Also could apply to LOGISTIC Model:

data have;
 set sashelp.heart;
 y=ifn(status='Dead',0,1);
run;

proc logistic data=have;
model y=height weight/rsquare;
output out=want p=pred;
run;
proc corr data=want;
var y pred;
run;

Here 0.14804*0.14804=0.0219 (same as proc logistic)

palolix · Posted 11-22-2024 04:59 PM

That's great, thank you very much Ksharp!

I tried this code (omitting the first part) when my outcome var is binary and I got pretty similar values for r-square. However, when I tried it for an ordinal outcome var I got pretty different r-square results (0.07172*0.07172=0.0051 vs 0.1022). Am I doing something wrong or does this only work for binary outcome variables?

This is the code I'm using :

proc logistic data=one desc;
model hedonic=DM/link=clogit rsquare;
output out=want3 p=pred;
run;
proc corr data=want3;
var hedonic pred;
run;

These are the results I get:

Response Profile
Ordered Value	hedonic	Total Frequency
1	9	17
2	8	93
3	7	99
4	6	103
5	5	86
6	4	100
7	3	50
8	2	31
9	1	4

Probabilities modeled are cumulated over the lower Ordered Values.

R-Square	0.1022	Max-rescaled R-Square	0.1042

Testing Global Null Hypothesis: BETA=0
Test	Chi-Square	DF	Pr > ChiSq
Likelihood Ratio	62.8777	1	<.0001
Score	60.9165	1	<.0001
Wald	60.5840	1	<.0001

The CORR Procedure

hedonic

1.00000

4680

0.07172

<.0001

4664

pred
Estimated Probability

0.07172

<.0001

4664

1.00000

4664

Ksharp · Posted 11-23-2024 01:14 AM

If your Y variable is ordinal with multiple values like : 1,2,3,4
the PROC LOGISTIC would lead you to THREE models:
1 v.s 2,3,4
1,2 v.s 3,4
1,2,3 v.s 4

Therefore, my idea is NOT suited for this scenario I think.

palolix · Posted 11-23-2024 11:16 PM

Oh ok, but at least it helped me a lot with proc glm and when having a binary outcome in proc logistic. Thanks a lot Ksharp!

Registration is open