BookmarkSubscribeRSS Feed
Lao_feng
Obsidian | Level 7

I would like to use restricted cubic spline functions to check nonlinearity for logistic regression . Does anyone share with me  SAS macro that can solve it? Thank you !

10 REPLIES 10
Rick_SAS
SAS Super FREQ

You can use the EFFECT statement to construct splines and then use those splines in the MODEL statement. An example and discussion are provided in the article "Regression with restricted cubic splines." Although the example is shown for a linear regression, PROC LOGISTIC also supports the EFFECT statement.

JacobSimonsen
Barite | Level 11

Agree with Rick, plotting splines are fairly simple with effect statements. You dont need to do any calculation yourself. Here is an example

 


*simulate som data;
*using probabilites depening on sin(t);
data simulation;
  do i=1 to 10000;
  t=rand('uniform',0,10);
  p=1/(1+exp(sin(t)));
  y=rand('bernoulli',p);
  output;
  end;
run;

*model a natural cubic spline;
*and store the result in "mystore";
proc logistic data=simulation;
  effect myspline=spline(t/knotmethod=list(1,2,3,4,5,6,7,8,9) NATURALCUBIC);
  model y(event="1")=myspline;
  store mystore;
run;

*generate values for t - this will be used drawing the spline;
data template;
  do t=0 to 10 by 0.01;
    output;
  end;
run;

*calculate the predicated values for each value of t;
proc plm restore=mystore;
  score data=template out=plot predicted=predicted lclm=lclm uclm=uclm/ilink;
run;
*For comparison with the true values of p;
data plot;
  set plot ;
  p=1/(1+exp(sin(t)));
run;

*plot predicated values together with the true values;
proc sgplot data=plot;
  band x=t lower=lclm upper=uclm /
       fillattrs=GraphConfidence2 legendlabel="95% CLM" name="band2";
  series x=t y=predicted /lineattrs=(color=red);
  series x=t y=p ;
run;

 

Lao_feng
Obsidian | Level 7

Thank you !

In the plot, I want Y Axis to be Ln (odds ratio) or odds ratio (instead of Probability) with a fixed value of t as the reference, how do I do that? how do I get the 95% confidence interval for the Ln (odds ratio) or odds ratio?

JacobSimonsen
Barite | Level 11

It may be easier to use the gampl procedure, but Im not the right one to help if you choose that one instead.

 

The effect statement in proc logistic/glimmix can also be used to model and draw the spline for oddsratios. But its not so easy. I found that it is possible by using the noint option, which removes the intercept. Then, in the template dataset used in proc glm, the intercept variable can be set to 0, which then allows for identifying the oddsratio between two groups, instead of odds or probabilities. 

 

The test for non-linearyty can be done with proc glimmix by adding in the linear effect - then look at the type 3 test for the spline.

 

Here is an example showing that gives the odds ratio as a spline effect.

 


*simulate som data;
*using probabilites depening on sin(t);
data simulation;
  intercept=1;
  do strata=1 to 20;
    u=rand('normal',0,1);
  do gender=0,1;
  do i=1 to 50;
    t=rand('uniform',0,10);
    p=1/(1+exp(u+sin(t+gender*0.5*atan(1))));
    y=rand('bernoulli',p);
  output;
  end;
  end;
  end;
run;

*model a natural cubic spline;
*and store the result in "mystore";
*the therm gender*t is nested in gender*myspline, but it makes the type3 test for nonlinearity of the odds ratio correct. (as wehen gender*myspline is test out, the linear relation is left);
proc glimmix data=simulation() method=laplace;
class strata;
  effect myspline=spline(t/knotmethod=list(1,3,5,7,9) NATURALCUBIC);
  model y(event="1")= intercept*myspline gender*t gender*myspline/dist=binary link=logit noint ;
  random strata;
  store mystore;
run;

*generate with values for t - this will be used drawing the spline;
data template;
intercept=0;
do gender=1;
  do t=0 to 10 by 0.01;
    output;
  end;
  end;
run;

*calculate the predicated values for each value of t;
proc plm restore=mystore;
  score data=template out=plot predicted=predicted lclm=lclm uclm=uclm;
run;


data plot;
  set plot;
  by t;
  *transform to oddsratio scale;
  oddsratio=exp(predicted);
  upper=exp(uclm);
  lower=exp(lclm);

 *the true value of odds ratio (for comparison);
  oddsratio_true=exp( -sin(t+0.5*atan(1)) + sin(t) );
run;

proc sgplot data=plot;
  band x=t lower=lower upper=upper /
       fillattrs=GraphConfidence2 legendlabel="95% CLM" name="band2";
  series x=t y=oddsratio /lineattrs=(color=red);
  series x=t y=oddsratio_true ;
run;
Lao_feng
Obsidian | Level 7

Thank you ! However, if I am not wrong, exp(predicted) is odds, not odds ratio. To calculate odds ratio, I need to obtain odds of a reference value of t. I can choose mean or median as reference.  Afterwards, I can have odds ratio. My problem is that I do not know how to obtain the standard error of odds ratio through which I will have the 95% CI.

JacobSimonsen
Barite | Level 11

because the intercept variable was is 0, then the predicted value in this example actually do give the odds ratio

April2211
Fluorite | Level 6

Hello! How can I get odds ratio instead of odds by "effect"? Thank you!

Lao_feng
Obsidian | Level 7

Thank you ! I read your article already and it is very helpful. This is my first time to use restricted cubic spline curve functions. I have some additional questions.

1) In order to examine if nonlinear relationship exists, I need to perform a likelihood ratio test by comparing the -2 log likelihood of the Logistic model including new variables (created by 'effect' statement) with that of the logistic model including the original continuous variable. Is it correct?

 

If the logistic model include random effects, I should use "proc glimmix". Do you know how to get the likelihood ratio and test if nonlinear relationship is statistically significant?

 

 

2) In the plot, I want Y Axis to be Ln (odds ratio) or odds ratio with a fixed value of the continuous variable as the reference, how do I do that? how do I get the 95% confidence interval for the Ln (odds ratio)?

 

StatDave
SAS Super FREQ

It's even easier to do this as a generalized additive logistic model in PROC GAMPL. See the example titled "Nonparametric Logistic Regression" in the GAMPL documentation. Basic syntax is:

 

proc gampl;

model y = spline(x) / dist=binary;

run;

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 3205 views
  • 0 likes
  • 5 in conversation