Hi @TakakuraMD,
One method described in Hosmer/Lemeshow: Applied Logistic Regression, 3rd ed., p. 95 f. (in the 2nd edition: p. 99) involves creating a 4-level categorical version of the continuous variable (based on the quartiles) and using this in place of the original variable in a logistic regression model including the other model variables ("heart disease" in your example).
/* Create test data for demonstration */
data have;
call streaminit(31415926);
do subjid=1 to 500;
age=int(rand('uniform',18,75));
heartdis=rand('bern',age/100-0.15);
p=logistic(-4.56+0.0567*age+1.23*heartdis);
dead=rand('bern',p);
output;
end;
run;
Here is an implementation of this method for variable AGE in the above dataset HAVE:
/* Compute age quartiles */
proc summary data=have;
var age;
output out=_qtls(drop=_type_ _freq_) min=_min q1=_q1 median=_q2 q3=_q3 max=_max;
run;
/* Create a categorical version of AGE with four levels */
data _tmpana(rename=(agecat=age));
if _n_=1 then set _qtls;
set have;
if age>_q3 then agecat=4;
else if age>_q2 then agecat=3;
else if age>_q1 then agecat=2;
else if age>. then agecat=1;
else agecat=.;
drop age;
run;
/* Create a model using the new categorical variable AGE in place of the continuous original */
ods output ParameterEstimates=est(keep=Variable ClassVal0 Estimate where=(Variable="age"));
proc logistic data = _tmpana desc;
class heartdis(ref='0') age(ref='1') / param=ref;
model dead = heartdis age;
run;
/* Combine quartile midpoints and corresponding model coefficients */
data _midp;
set est(drop=Variable);
if _n_=1 then do;
set _qtls;
age=(_min+_q1)/2;
_coeff=0;
output;
end;
select(ClassVal0);
when('2') do;
age=(_q1+_q2)/2;
_coeff=Estimate;
output;
end;
when('3') do;
age=(_q2+_q3)/2;
_coeff=Estimate;
output;
end;
when('4') do;
age=(_q3+_max)/2;
_coeff=Estimate;
output;
end;
otherwise;
end;
keep _coeff age;
run;
/* Plot coefficients vs. quartile midpoints to check the linearity assumption */
proc sgplot data=_midp;
series x=age y=_coeff / markers;
run;
The resulting plot supports the assumption that the model is linear in the logit for variable AGE:
... View more