BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TakakuraMD
Calcite | Level 5

Hi! 

 

I am having a difficult time plotting the logit (of finding the syntax to plot the logit) of an outcome against a continuous variable.

I am doing this in order to check for linearity for a logistical regression. I have a binary outcome that I modeled against a continuous variable and another binary variable

 

Please help!

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Anytime. If you found your answer, please mark the thread as accepted to help other users navigate the forum 🙂

 

Otherwise, please post in this thread again if you have questions.

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

Welcome to the SAS communities. Do you simply want to plot the logit function? 

 

Then do something like this

 

data logit;
   do p=0.01 to 0.99 by 0.01;
      logit=log(p/(1-p));
      output;
   end;
run;

title "Plotting the logit function";
proc sgplot data=logit;
   series x=p y=logit;
   xaxis grid;
   yaxis grid label="logit(p)";
run;
title;
TakakuraMD
Calcite | Level 5
Hi,


Thank you for your response. I wanted to do this for my data set.

My data set consists of a binary outcome (dead or alive)

1 continuous variable (age)

1 binary variable Having a heart disease (yes or no)


I want to plot the logit of dead or alive vs the variable age and check for linearity. I think some call this checking for "linear in the logit"


Thanks!
PeterClemmensen
Tourmaline | Level 20

Try a Google search for "sas empirical logit plot". Quite a few examples. 

 

Post example data if you want a usable code answer 🙂

TakakuraMD
Calcite | Level 5
Thank you,


Found this very helpful. You guys are the best.
PeterClemmensen
Tourmaline | Level 20

Anytime. If you found your answer, please mark the thread as accepted to help other users navigate the forum 🙂

 

Otherwise, please post in this thread again if you have questions.

Ksharp
Super User

@Rick_SAS might give you an hand, He wrote a blog about it before.

 

 

data have;

 set have;

 good_bad=ifn(outcome='dead',1,0);

run;

 

proc sgplot data=have;

loess x=age y=good_bad;

run;

FreelanceReinh
Jade | Level 19

Hi @TakakuraMD,

 

One method described in Hosmer/Lemeshow: Applied Logistic Regression, 3rd ed., p. 95 f. (in the 2nd edition: p. 99) involves creating a 4-level categorical version of the continuous variable (based on the quartiles) and using this in place of the original variable in a logistic regression model including the other model variables ("heart disease" in your example).

/* Create test data for demonstration */

data have;
call streaminit(31415926);
do subjid=1 to 500;
  age=int(rand('uniform',18,75));
  heartdis=rand('bern',age/100-0.15);
  p=logistic(-4.56+0.0567*age+1.23*heartdis);
  dead=rand('bern',p);
  output;
end;
run;

Here is an implementation of this method for variable AGE in the above dataset HAVE:

/* Compute age quartiles */

proc summary data=have;
var age;
output out=_qtls(drop=_type_ _freq_) min=_min q1=_q1 median=_q2 q3=_q3 max=_max;
run;

/* Create a categorical version of AGE with four levels */

data _tmpana(rename=(agecat=age));
if _n_=1 then set _qtls;
set have;
if age>_q3 then agecat=4;
else if age>_q2 then agecat=3;
else if age>_q1 then agecat=2;
else if age>.   then agecat=1;
else agecat=.;
drop age;
run;

/* Create a model using the new categorical variable AGE in place of the continuous original */

ods output ParameterEstimates=est(keep=Variable ClassVal0 Estimate where=(Variable="age"));
proc logistic data = _tmpana desc;
class heartdis(ref='0') age(ref='1') / param=ref;
model dead = heartdis age;
run;

/* Combine quartile midpoints and corresponding model coefficients */

data _midp;
set est(drop=Variable);
if _n_=1 then do;
  set _qtls;
  age=(_min+_q1)/2;
  _coeff=0;
  output;
end;
select(ClassVal0);
  when('2') do;
              age=(_q1+_q2)/2;
              _coeff=Estimate;
              output;
            end;
  when('3') do;
              age=(_q2+_q3)/2;
              _coeff=Estimate;
              output;
            end;
  when('4') do;
              age=(_q3+_max)/2;
              _coeff=Estimate;
              output;
            end;
  otherwise;
end;
keep _coeff age;
run;

/* Plot coefficients vs. quartile midpoints to check the linearity assumption */

proc sgplot data=_midp;
series x=age y=_coeff / markers;
run;

The resulting plot supports the assumption that the model is linear in the logit for variable AGE:

linearity_check.png