Hi!
I am having a difficult time plotting the logit (of finding the syntax to plot the logit) of an outcome against a continuous variable.
I am doing this in order to check for linearity for a logistical regression. I have a binary outcome that I modeled against a continuous variable and another binary variable
Please help!
Thanks!
Anytime. If you found your answer, please mark the thread as accepted to help other users navigate the forum 🙂
Otherwise, please post in this thread again if you have questions.
Welcome to the SAS communities. Do you simply want to plot the logit function?
Then do something like this
data logit;
do p=0.01 to 0.99 by 0.01;
logit=log(p/(1-p));
output;
end;
run;
title "Plotting the logit function";
proc sgplot data=logit;
series x=p y=logit;
xaxis grid;
yaxis grid label="logit(p)";
run;
title;
Try a Google search for "sas empirical logit plot". Quite a few examples.
Post example data if you want a usable code answer 🙂
Anytime. If you found your answer, please mark the thread as accepted to help other users navigate the forum 🙂
Otherwise, please post in this thread again if you have questions.
@Rick_SAS might give you an hand, He wrote a blog about it before.
data have;
set have;
good_bad=ifn(outcome='dead',1,0);
run;
proc sgplot data=have;
loess x=age y=good_bad;
run;
Hi @TakakuraMD,
One method described in Hosmer/Lemeshow: Applied Logistic Regression, 3rd ed., p. 95 f. (in the 2nd edition: p. 99) involves creating a 4-level categorical version of the continuous variable (based on the quartiles) and using this in place of the original variable in a logistic regression model including the other model variables ("heart disease" in your example).
/* Create test data for demonstration */
data have;
call streaminit(31415926);
do subjid=1 to 500;
age=int(rand('uniform',18,75));
heartdis=rand('bern',age/100-0.15);
p=logistic(-4.56+0.0567*age+1.23*heartdis);
dead=rand('bern',p);
output;
end;
run;
Here is an implementation of this method for variable AGE in the above dataset HAVE:
/* Compute age quartiles */
proc summary data=have;
var age;
output out=_qtls(drop=_type_ _freq_) min=_min q1=_q1 median=_q2 q3=_q3 max=_max;
run;
/* Create a categorical version of AGE with four levels */
data _tmpana(rename=(agecat=age));
if _n_=1 then set _qtls;
set have;
if age>_q3 then agecat=4;
else if age>_q2 then agecat=3;
else if age>_q1 then agecat=2;
else if age>. then agecat=1;
else agecat=.;
drop age;
run;
/* Create a model using the new categorical variable AGE in place of the continuous original */
ods output ParameterEstimates=est(keep=Variable ClassVal0 Estimate where=(Variable="age"));
proc logistic data = _tmpana desc;
class heartdis(ref='0') age(ref='1') / param=ref;
model dead = heartdis age;
run;
/* Combine quartile midpoints and corresponding model coefficients */
data _midp;
set est(drop=Variable);
if _n_=1 then do;
set _qtls;
age=(_min+_q1)/2;
_coeff=0;
output;
end;
select(ClassVal0);
when('2') do;
age=(_q1+_q2)/2;
_coeff=Estimate;
output;
end;
when('3') do;
age=(_q2+_q3)/2;
_coeff=Estimate;
output;
end;
when('4') do;
age=(_q3+_max)/2;
_coeff=Estimate;
output;
end;
otherwise;
end;
keep _coeff age;
run;
/* Plot coefficients vs. quartile midpoints to check the linearity assumption */
proc sgplot data=_midp;
series x=age y=_coeff / markers;
run;
The resulting plot supports the assumption that the model is linear in the logit for variable AGE:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.