Hello,
I would like to ask whether the below graphs are all different for some reasons and whether I can fix them out or I can have supporting idea to explain them.
Three of them come from exactly the same data but with different proc procedures.
Major concern is that proc plm gave me a probability with CI, which I wanted to have, but that is different from logistic with spline effect also different from what I have from excel graph.
I have the blue graph to estimate the probability value based on the equation, which looks similar to adaptivereg and logistic spline.
Whilst the plm graph are linear, why the rest of them have u shape? Is this because of the logic behind each procedures are different of I have done in wrong way round?
Another question is, I guess logistic spline and the excel graph are supposed to be the same as the both have u shape.
But then logistic spline has its lowest probability at age 26-27 whilst the blue graph, which I have made via excel has 30-32 at its lowest.. Does it look like excel seems to be wrong? But then I have checked many times but havent found what was wrong.
But anyway, please let me know whether it seems the blue one needs to be changes since logistic with spline effect are right
> Whilst the plm graph are linear, why the rest of them have u shape?
> Is this because of the logic behind each procedures are different of I have done in wrong way round?
Yes. It is because the procedures treat the explanatory variable differently.
The first model in your code is a LINEAR model in PROC LOGISTIC that uses only age as the explanatory variable. It is plotted by using the EFFECTPLOT stmt in PROC PLM. By definition, this model will be a logistic curve ("sigmoid shape"). Because age does not have a large effect when predicting Health, the sigmoid curve is very flat and looks almost linear.
The second model is ADAPTIVEREG, which uses spline regression and uses an algorithm to automatically choose knots (thus the "adaptive" part of the name). Age is not represented as a single variable, but as several spline effects. Therefore, this model can result in a U-shaped fit.
The third model is LOGISTIC with spline effects (although I think you misspecified the MODEL stmt, which should be MODEL HEALTH=SPL). This model uses a basic of cubic and 5 interior knots. Again, because Age is represented by multiple spline effects, this model can result in a U-shaped fit.
> Major concern is that proc plm gave me a probability with CI, which I wanted to have
Confidence limits for predicted values are created under the assumption that the model is correctly specified. If you use a linear model to fit data that do not look like the model, the CLs are useless. I think in your case, the data indicate a nonlinear effect of Age. Therefore you should not use the linear LOGISTIC model.
Calling @Rick_SAS
I would not expect Logistic and AdaptiveReg to give the same answers as they probably use different algorithms to fit the model to the data. As far as the PLM output, I think you haven't explained how you got PLM to work, it has to take the results of a modeling PROC in SAS, but you don't say what PROC or what model.
I can't say anything about Excel, as you didn't explain what you did there. However, I am wondering if the data is really a true quadratic as you seem to be showing in your Excel output.
Thanks for your reply!!
Here is the process how I get the quadratic form.
Age is continuous variable(from 20-29) and from logistic regression got the coefficient value and intercept.
I'm afraid I don't understand Excel well enough to offer an opinion. I was really asking to understand what you did in Excel, the idea rather than the exact Excel functions and calculations.
It would be useful to see the code that generates each graph. In particular, the PLM output is based on what model and what procedure?
A spline is a piecewise polynomial on intervals that are determined by the location of the knots. The default placement of the knots might explain the difference between logistic and excel (but I am not sure). The LOGISTIC and ADAPTIVEREG outputs are qualitatively similar; the differences are likely due to differences in the methods. The PLM output predicts probabilities between 0.2 and 0.1, which agrees with the other plots, but it you are using a linear model, it will fit a least squares line, whch would explain why that model is not U-shaped.
Thanks for your reply!!!
Here are the code!
PROC LOGISTIC DATA=DATA;
MODEL HEALTH(EVENT='1')=AGE ;
STORE LOGITMODEL; RUN;
PROC PLM SOURCE=LOGITMODEL;
EFFECTPLOT FIT(X=AGE); RUN; /*logistic plm*/
PROC ADAPTIVEREG DATA=HEALTH;
MODEL HEALTH(EVENT='1')=AGE;
OUTPUT OUT = NONPARA_HEALTH P(ILINK) ; RUN;
PROC SGPLOT DATA=NONPARA_HEALTH ;
BAND X=AGE; /*but it does not agive CI as PLM did...*/
SERIES X=AGE Y=PRED ; RUN; /*nonparametric method*/
PROC LOGISTIC DATA=HEALTH PLOTS=NONE;
EFFECT SPL= SPLINE(AGE/DETAILS NATURALCUBIC BASIS=TPF(NOINT) KNOTMETHOD=PERCENTILES(5) );
MODEL HEALTH (EVENT='1')= AGE_19;
OUTPUT OUT=PROBABILITY PREDICTED=PREDPROB; RUN;
PROC SGPLOT DATA= PROBABILITY NOAUTOLEGEND; RUN; /*PARAMETRIC LOGISTIC WITH SPLINE EFFECT*/
> Whilst the plm graph are linear, why the rest of them have u shape?
> Is this because of the logic behind each procedures are different of I have done in wrong way round?
Yes. It is because the procedures treat the explanatory variable differently.
The first model in your code is a LINEAR model in PROC LOGISTIC that uses only age as the explanatory variable. It is plotted by using the EFFECTPLOT stmt in PROC PLM. By definition, this model will be a logistic curve ("sigmoid shape"). Because age does not have a large effect when predicting Health, the sigmoid curve is very flat and looks almost linear.
The second model is ADAPTIVEREG, which uses spline regression and uses an algorithm to automatically choose knots (thus the "adaptive" part of the name). Age is not represented as a single variable, but as several spline effects. Therefore, this model can result in a U-shaped fit.
The third model is LOGISTIC with spline effects (although I think you misspecified the MODEL stmt, which should be MODEL HEALTH=SPL). This model uses a basic of cubic and 5 interior knots. Again, because Age is represented by multiple spline effects, this model can result in a U-shaped fit.
> Major concern is that proc plm gave me a probability with CI, which I wanted to have
Confidence limits for predicted values are created under the assumption that the model is correctly specified. If you use a linear model to fit data that do not look like the model, the CLs are useless. I think in your case, the data indicate a nonlinear effect of Age. Therefore you should not use the linear LOGISTIC model.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.