BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Josh_Myers
Fluorite | Level 6

I want to run a logistic regression with multiple variables modelled with splines and some of those splines involved with interactions (Enterprise Guide 7.15). From the documentation (https://documentation.sas.com/?docsetId=casecon&docsetTarget=viyaets_introcom_sect010.htm&docsetVers...), I thought that I would achieve this by using the 'separate' option and then refer to the spline variables as spl_x1, spl_x2 (as per below), but I get an error: "Variable spl_x1 not found"

 

proc logistic data=data;

  effect spl = spline(x1 x2 / separate details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));

  class x3;

  model y = spl_x1 spl_x2 spl_x1*x3 / link = probit;

run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

There is nothing wrong with the syntax in your second post. The problem is the data does not permit a spline by using method=percentilelist. This can happen when the data set is small or when the x1 or x2 variables have repeated values (tied values) which makes some of the percentiles the same. For example, maybe the 72.5th percentile and the 95th percentile are the same value.

 

To see an example, run the following code, which simulates data and then runs the model that you specify. It works fine. Then uncomment the line in the DATA  step so that x1 has only a few distinct values. When you rerun the program, you get the error:
ERROR: Splines with repeated knots are not supported when you use the
NATURALCUBIC option.

data Have;
call streaminit(123);
do i = 1 to 1000;
   x1 = rand("Normal");   
   *x1 = int(x1);    /* uncomment this line */
   x2 = rand("Normal");
   x3 = rand("Table", .4, .2);
   eta = 2 + x1 + - x2 - cos(x2) + cos(x1)*(x3=1) + cos(x1)*(x3=2);
   y = rand("Bern", logistic(eta));
   output;
end;
run;

/* look at some standard percentiles */
proc means data=Have P5 P25 P50 P75 P95;
   var x1-x3;
run;

proc logistic data=Have;
  effect spl_x1 = spline(x1 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
  effect spl_x2 = spline(x2 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
  class x3;
  model y = spl_x1 spl_x2 spl_x1*x3 / link = probit;
run;

View solution in original post

4 REPLIES 4
Josh_Myers
Fluorite | Level 6

I initially thought that I need to have multiple effect statements like below, but I get an error: "Splines with repeated knots are not supported when you use the NATURALCUBIC option". Perhaps I could calculate the percentiles with proc univariate insert into list: knotmethod=list(...)?

proc logistic data=data;

  effect spl_x1 = spline(x1 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
  effect spl_x2 = spline(x2 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
  class x3;

  model y = spl_x1 spl_x2 spl_x1*x3 / link = probit;

run;
SteveDenham
Jade | Level 19

Remember you can get percentiles from PROC SUMMARY or PROC MEANS as well, which don't have the overhead of UNIVARIATE. I hope your knotmethod=list(...) works out, but somehow I have a bad feeling about this. I think you may have to move away from the NATURALCUBIC basis to something else.

 

SteveDenham

Rick_SAS
SAS Super FREQ

There is nothing wrong with the syntax in your second post. The problem is the data does not permit a spline by using method=percentilelist. This can happen when the data set is small or when the x1 or x2 variables have repeated values (tied values) which makes some of the percentiles the same. For example, maybe the 72.5th percentile and the 95th percentile are the same value.

 

To see an example, run the following code, which simulates data and then runs the model that you specify. It works fine. Then uncomment the line in the DATA  step so that x1 has only a few distinct values. When you rerun the program, you get the error:
ERROR: Splines with repeated knots are not supported when you use the
NATURALCUBIC option.

data Have;
call streaminit(123);
do i = 1 to 1000;
   x1 = rand("Normal");   
   *x1 = int(x1);    /* uncomment this line */
   x2 = rand("Normal");
   x3 = rand("Table", .4, .2);
   eta = 2 + x1 + - x2 - cos(x2) + cos(x1)*(x3=1) + cos(x1)*(x3=2);
   y = rand("Bern", logistic(eta));
   output;
end;
run;

/* look at some standard percentiles */
proc means data=Have P5 P25 P50 P75 P95;
   var x1-x3;
run;

proc logistic data=Have;
  effect spl_x1 = spline(x1 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
  effect spl_x2 = spline(x2 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
  class x3;
  model y = spl_x1 spl_x2 spl_x1*x3 / link = probit;
run;
Josh_Myers
Fluorite | Level 6

Yes, there were tied values. Thank you so much!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1493 views
  • 0 likes
  • 3 in conversation