I want to run a logistic regression with multiple variables modelled with splines and some of those splines involved with interactions (Enterprise Guide 7.15). From the documentation (https://documentation.sas.com/?docsetId=casecon&docsetTarget=viyaets_introcom_sect010.htm&docsetVers...), I thought that I would achieve this by using the 'separate' option and then refer to the spline variables as spl_x1, spl_x2 (as per below), but I get an error: "Variable spl_x1 not found"
proc logistic data=data;
effect spl = spline(x1 x2 / separate details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
class x3;
model y = spl_x1 spl_x2 spl_x1*x3 / link = probit;
run;
There is nothing wrong with the syntax in your second post. The problem is the data does not permit a spline by using method=percentilelist. This can happen when the data set is small or when the x1 or x2 variables have repeated values (tied values) which makes some of the percentiles the same. For example, maybe the 72.5th percentile and the 95th percentile are the same value.
To see an example, run the following code, which simulates data and then runs the model that you specify. It works fine. Then uncomment the line in the DATA step so that x1 has only a few distinct values. When you rerun the program, you get the error:
ERROR: Splines with repeated knots are not supported when you use the
NATURALCUBIC option.
data Have;
call streaminit(123);
do i = 1 to 1000;
x1 = rand("Normal");
*x1 = int(x1); /* uncomment this line */
x2 = rand("Normal");
x3 = rand("Table", .4, .2);
eta = 2 + x1 + - x2 - cos(x2) + cos(x1)*(x3=1) + cos(x1)*(x3=2);
y = rand("Bern", logistic(eta));
output;
end;
run;
/* look at some standard percentiles */
proc means data=Have P5 P25 P50 P75 P95;
var x1-x3;
run;
proc logistic data=Have;
effect spl_x1 = spline(x1 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
effect spl_x2 = spline(x2 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
class x3;
model y = spl_x1 spl_x2 spl_x1*x3 / link = probit;
run;
I initially thought that I need to have multiple effect statements like below, but I get an error: "Splines with repeated knots are not supported when you use the NATURALCUBIC option". Perhaps I could calculate the percentiles with proc univariate insert into list: knotmethod=list(...)?
proc logistic data=data;
effect spl_x1 = spline(x1 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
effect spl_x2 = spline(x2 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
class x3;
model y = spl_x1 spl_x2 spl_x1*x3 / link = probit;
run;
Remember you can get percentiles from PROC SUMMARY or PROC MEANS as well, which don't have the overhead of UNIVARIATE. I hope your knotmethod=list(...) works out, but somehow I have a bad feeling about this. I think you may have to move away from the NATURALCUBIC basis to something else.
SteveDenham
There is nothing wrong with the syntax in your second post. The problem is the data does not permit a spline by using method=percentilelist. This can happen when the data set is small or when the x1 or x2 variables have repeated values (tied values) which makes some of the percentiles the same. For example, maybe the 72.5th percentile and the 95th percentile are the same value.
To see an example, run the following code, which simulates data and then runs the model that you specify. It works fine. Then uncomment the line in the DATA step so that x1 has only a few distinct values. When you rerun the program, you get the error:
ERROR: Splines with repeated knots are not supported when you use the
NATURALCUBIC option.
data Have;
call streaminit(123);
do i = 1 to 1000;
x1 = rand("Normal");
*x1 = int(x1); /* uncomment this line */
x2 = rand("Normal");
x3 = rand("Table", .4, .2);
eta = 2 + x1 + - x2 - cos(x2) + cos(x1)*(x3=1) + cos(x1)*(x3=2);
y = rand("Bern", logistic(eta));
output;
end;
run;
/* look at some standard percentiles */
proc means data=Have P5 P25 P50 P75 P95;
var x1-x3;
run;
proc logistic data=Have;
effect spl_x1 = spline(x1 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
effect spl_x2 = spline(x2 / details naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 27.5 50 72.5 95));
class x3;
model y = spl_x1 spl_x2 spl_x1*x3 / link = probit;
run;
Yes, there were tied values. Thank you so much!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.