BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Daniellekeem
Fluorite | Level 6

Hello, 

 I would like to ask whether the below graphs are all different for some reasons and whether I can fix them out or I can have supporting idea to explain them. 

Three of them come from exactly the same data but with different proc procedures. 

Major concern is that proc plm gave me a probability with CI, which I wanted to have, but that is different from logistic with spline effect also different from what I have from excel graph.

I have the blue graph to estimate the probability value based on the equation, which looks similar to adaptivereg and logistic spline.

Whilst the plm graph are linear, why the rest of them have u shape? Is this because of the logic behind each procedures are different of I have done in wrong way round? 

Another question is, I guess logistic spline and the excel graph are supposed to be  the same as the both have u shape.

But then logistic spline has its lowest probability at age 26-27 whilst the blue graph, which I have made via excel has 30-32 at its lowest.. Does it look like excel seems to be wrong? But then I have checked many times but havent found what was wrong. 

But anyway, please let me know whether it seems the blue one needs to be changes since logistic with spline effect are right Capture.PNGCapture1.PNGimage.png

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Whilst the plm graph are linear, why the rest of them have u shape?

> Is this because of the logic behind each procedures are different of I have done in wrong way round? 

 

Yes. It is because the procedures treat the explanatory variable differently. 

 

The first model in your code is a LINEAR model in PROC LOGISTIC that uses only age as the explanatory variable. It is plotted by using the EFFECTPLOT stmt in PROC PLM. By definition, this model will be a logistic curve ("sigmoid shape"). Because age does not have a large effect when predicting Health, the sigmoid curve is very flat and looks almost linear.

 

The second model is ADAPTIVEREG, which uses spline regression and uses an algorithm to automatically choose knots (thus the "adaptive" part of the name).  Age is not represented as a single variable, but as several spline effects. Therefore, this model can result in a U-shaped fit.

 

The third model is LOGISTIC with spline effects (although I think you misspecified the MODEL stmt, which should be MODEL HEALTH=SPL). This model uses a basic of cubic and 5 interior knots. Again, because Age is represented by multiple spline effects, this model can result in a U-shaped fit.

 

 

Major concern is that proc plm gave me a probability with CI, which I wanted to have

 

Confidence limits for predicted values are created under the assumption that the model is correctly specified. If you use a linear model to fit data that do not look like the model, the CLs are useless. I think in your case, the data indicate a nonlinear effect of Age. Therefore you should not use the linear LOGISTIC model.

 

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

I would not expect Logistic and AdaptiveReg to give the same answers as they probably use different algorithms to fit the model to the data. As far as the PLM output, I think you haven't explained how you got PLM to work, it has to take the results of a modeling PROC in SAS, but you don't say what PROC or what model.

 

I can't say anything about Excel, as you didn't explain what you did there. However, I am wondering if the data is really a true quadratic as you seem to be showing in your Excel output.

--
Paige Miller
Daniellekeem
Fluorite | Level 6

Thanks for your reply!! 

Here is the process how I get the quadratic form.

Age is continuous variable(from 20-29) and from logistic regression got the coefficient value and intercept.

 

 

PaigeMiller
Diamond | Level 26

I'm afraid I don't understand Excel well enough to offer an opinion. I was really asking to understand what you did in Excel, the idea rather than the exact Excel functions and calculations.

--
Paige Miller
Rick_SAS
SAS Super FREQ

It would be useful to see the code that generates each graph. In particular, the PLM output is based on what model and what procedure? 

 

A spline is a piecewise polynomial on intervals that are determined by the location of the knots. The default placement of the knots might explain the difference between logistic and excel (but I am not sure).  The LOGISTIC and ADAPTIVEREG outputs are qualitatively similar; the differences are likely due to differences in the methods. The PLM output predicts probabilities between 0.2 and 0.1, which agrees with the other plots, but it you are using a linear model, it will fit a least squares line, whch would explain why that model is not U-shaped.

Daniellekeem
Fluorite | Level 6

Thanks for your reply!!! 

Here are the code! 

PROC LOGISTIC DATA=DATA;			
	MODEL HEALTH(EVENT='1')=AGE ;		
	STORE LOGITMODEL; RUN;                                   		

PROC PLM SOURCE=LOGITMODEL;
		EFFECTPLOT FIT(X=AGE); RUN;  /*logistic plm*/

PROC ADAPTIVEREG DATA=HEALTH;
MODEL HEALTH(EVENT='1')=AGE;
OUTPUT OUT = NONPARA_HEALTH P(ILINK) ; RUN;
PROC SGPLOT DATA=NONPARA_HEALTH ;
BAND X=AGE; /*but it does not agive CI as PLM did...*/
SERIES X=AGE Y=PRED ; RUN; /*nonparametric method*/


PROC LOGISTIC DATA=HEALTH PLOTS=NONE;
EFFECT SPL= SPLINE(AGE/DETAILS NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILES(5) );
MODEL HEALTH (EVENT='1')= AGE_19;
OUTPUT OUT=PROBABILITY PREDICTED=PREDPROB; RUN;
PROC SGPLOT DATA= PROBABILITY NOAUTOLEGEND; RUN; /*PARAMETRIC LOGISTIC WITH SPLINE EFFECT*/


 

Rick_SAS
SAS Super FREQ

Whilst the plm graph are linear, why the rest of them have u shape?

> Is this because of the logic behind each procedures are different of I have done in wrong way round? 

 

Yes. It is because the procedures treat the explanatory variable differently. 

 

The first model in your code is a LINEAR model in PROC LOGISTIC that uses only age as the explanatory variable. It is plotted by using the EFFECTPLOT stmt in PROC PLM. By definition, this model will be a logistic curve ("sigmoid shape"). Because age does not have a large effect when predicting Health, the sigmoid curve is very flat and looks almost linear.

 

The second model is ADAPTIVEREG, which uses spline regression and uses an algorithm to automatically choose knots (thus the "adaptive" part of the name).  Age is not represented as a single variable, but as several spline effects. Therefore, this model can result in a U-shaped fit.

 

The third model is LOGISTIC with spline effects (although I think you misspecified the MODEL stmt, which should be MODEL HEALTH=SPL). This model uses a basic of cubic and 5 interior knots. Again, because Age is represented by multiple spline effects, this model can result in a U-shaped fit.

 

 

Major concern is that proc plm gave me a probability with CI, which I wanted to have

 

Confidence limits for predicted values are created under the assumption that the model is correctly specified. If you use a linear model to fit data that do not look like the model, the CLs are useless. I think in your case, the data indicate a nonlinear effect of Age. Therefore you should not use the linear LOGISTIC model.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 684 views
  • 4 likes
  • 4 in conversation