BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SeaMoon_168
Quartz | Level 8

As the title says, could you help me figure out how to convert a continuous variable age as cubic splines (25th, 75th percentile)? Many thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Check @Rick_SAS blogs for this topic:

https://blogs.sas.com/content/iml/2020/05/11/cubic-interpolation-sas.html
https://blogs.sas.com/content/iml/2017/04/19/restricted-cubic-splines-sas.html
https://blogs.sas.com/content/iml/2019/10/16/visualize-regression-splines.html
https://blogs.sas.com/content/iml/2024/06/03/vize-multivar-regression-splines.html
https://blogs.sas.com/content/iml/2019/02/18/regression-restricted-cubic-splines-sas.html

 

Here I use the code written by Rick to get cubic spline effect/term , if you want cubic interpolation check the first one URL I marked bold.

data cars;
set sashelp.cars;
keep mpg_city weight;
run;
proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
   effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 50 75) );
   model mpg_city = spl / selection=none noint;  
quit;

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

Please describe the problem in more detail.

--
Paige Miller
SeaMoon_168
Quartz | Level 8

Many thanks for both your help. Here is the effect statement I used to create the spline age.

effect spl = spline(age / naturalcubic basis=BSPLINE knotmethod=percentilelist(25 75) );

I wonder if I can just use the data with the spline variable age (or weight) created by R instead of using the effect statement and then run the model using SAS. However, the results of the coefficients and the confidence intervals look different using SAS and R. 

 

Also, I wonder if the spline age has two categories?

spline.png

 

 

Rick_SAS
SAS Super FREQ

@SeaMoon_168 wrote:

I wonder if I can just use the data with the spline variable age (or weight) created by R and then run the model instead of using the effect statement.

However, the results of the coefficients and the confidence intervals look different. 

 

Also, I wonder if the spline age has two categories?

spline.png


Yes, in theory you could do that. However, I don't think you can use the NATURALCUBIC option with the SPLINE=BSPLINE option. Your log probably has a warning alerting you to the issue. I don't know whether that is the reason behind your strange output table. Please post the COMPLETE code you submitted and the COMPLETE ParameterEstimates table.

SeaMoon_168
Quartz | Level 8

Thank you for your help. Here is the SAS code. The cubic splines variables: weight and horsepower have percentiles 25% and 75% with degree=3.

proc print data=sashelp.cars(obs=10); run;
data cars;
set sashelp.cars;
keep mpg_city weight Horsepower;
run;
proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
   effect spl = spline(weight Horsepower/ naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 75) degree=3);
   model mpg_city = spl / selection=none noint;  
quit;

The result is shown below. However, the spline variables: weight and horsepower only have one coefficient? Should they have four coefficients for each variables as they are cubic splines (X^3, X^2, X, and constant)? How to control the knots and how many knots are required for this analysis? Could you help me figure it out? Many thanks!

spline_new.png

 

Rick_SAS
SAS Super FREQ

The LOG for your code states:

NOTE: Natural cubic splines with fewer than 3 knots reduce to linear polynomials.

which is why you only have one variable (and coefficient) for each spline basis. Remove the NATURALCUBIC option to get SIX (not four) basis elements.

 

The doc for the TPF option on the EFFECT statement states, "For splines of degree d defined with n knots for a variable x, this basis consists of an intercept, polynomials x, x^2, ..., x^d, and one truncated power function for each of the n knots."  In your example, d=3 and n=2 and you used the NOTIN option, so there are 3*2=6 basis columns in the design matrix that you specified (after you remove the NATURALCUBIC option).

SeaMoon_168
Quartz | Level 8

I want to create cubic spline terms for a continuous variable such as age, using the 25th and 75th percentiles as knots. For R, I could use the code

library(splines)
age_25 <- quantile(data$age, 0.25)
age_75 <- quantile(data$age, 0.75)
# Generate cubic spline terms
data$age_spline <- bs(data$age, knots = c(age_25, age_75), degree = 3)

Please let me know if further details are needed. Many thanks

 

 

Rick_SAS
SAS Super FREQ

It's always difficult to match results from different software because each software uses different defaults. For splines, there is the question of how to place the internal and external (boundary) knots. Also, be aware that SAS and R use different defaults for quantiles.

 

In SAS, you can use the EFFECT statement in many regression procedures to use a B-spline basis. You can output the spline basis, but that is usually not necessary.  For linear models, use PROC GLMSELECT with the SELECTION=NONE option on the model statement. See the blog posts that KSharp listed, but use the BASIS=BSPLINE option.

 

In SAS IML, you can use the BSPLINE function to get the basis of B-splines. This is closest to your R code.

data work.cars;
set sashelp.cars(obs=50);
where weight^=.;
keep mpg_city weight;
run;

proc iml;
/* compute the B-spline basis in R */
call exportdatasettoR("work.cars", "cars");
submit/R;
library(splines)
cars
wt_25 <- quantile(cars$Weight, 0.25)
wt_75 <- quantile(cars$Weight, 0.75)
wt_25
wt_75
age_spline <- bs(cars$Weight, knots = c(wt_25, wt_75), degree = 3)
age_spline   # generate degree 3 (cubic) B-splines with 2 internal knots
attr(age_spline,"knots")
attr(age_spline,"Boundary.knots")
endsubmit;

/* compute it in IML by using BSPLINE function */
use cars;
read all var "weight";
close;
q = {3351.5 , 3925.25 };   
deg = 3;
knots = repeat(min(weight), deg)    /* use deg boundary knots */
        // q // 
        repeat(max(weight), deg);
print knots;
wt_spline = bspline(weight, deg, knots); 
/* the R bs() function does not include the first column */
wt_spline = wt_spline[,2:ncol(wt_spline)];

create bspline from wt_spline;
append from wt_spline;
close;
QUIT;

 

 

If you want to perform regression, use PROC GLMSELECT or another regression procedure. 

SeaMoon_168
Quartz | Level 8

I wonder if I can just use the data with the spline variable age (or weight) created by R and then run the model instead of using the effect statement. However, the results of the coefficients and the confidence intervals look different using R and SAS effect statement.

Ksharp
Super User

The first URL (  cubic interpolation  )  I gave you  would get ONE variable.

But if using EFFECT statement you would get several variabels for cubic spline effect.

And you would notice the following in log  if you only use TWO percentile as knot:

5    proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
NOTE: 正在写入 HTML Body(主体)文件: sashtml.htm
6       effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 75) );
7       model mpg_city = spl / selection=none noint;
8    quit;

NOTE: Natural cubic splines with fewer than 3 knots reduce to linear polynomials.
NOTE: 从数据集 WORK.CARS. 读取了 428 个观测
NOTE: 数据集 WORK.SPLINEBASIS 有 428 个观测和 3 个变量。
NOTE: “PROCEDURE GLMSELECT”所用时间(总处理时间):
      实际时间          0.71 秒
      CPU 时间          0.11 秒

Ksharp_1-1735973228144.png

 

 

 

 

 

 

 

Therefore ,you need more knots to generate the cubic spline effect:

13   proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
14      effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 25 50 75 95) );
15      model mpg_city = spl / selection=none noint;
16   quit;

NOTE: 从数据集 WORK.CARS. 读取了 428 个观测
NOTE: 数据集 WORK.SPLINEBASIS 有 428 个观测和 6 个变量。
NOTE: “PROCEDURE GLMSELECT”所用时间(总处理时间):
      实际时间          0.09 秒
      CPU 时间          0.01 秒

And you would yield FOUR variables to represent cubic spline effect:

Ksharp_0-1735973138115.png

 

 

 

Or using Rick suggested "basis=bspline".

Ksharp_0-1735973799450.png

Or this cubic spline:

Ksharp_1-1735973894357.png

 

Ksharp
Super User

Check @Rick_SAS blogs for this topic:

https://blogs.sas.com/content/iml/2020/05/11/cubic-interpolation-sas.html
https://blogs.sas.com/content/iml/2017/04/19/restricted-cubic-splines-sas.html
https://blogs.sas.com/content/iml/2019/10/16/visualize-regression-splines.html
https://blogs.sas.com/content/iml/2024/06/03/vize-multivar-regression-splines.html
https://blogs.sas.com/content/iml/2019/02/18/regression-restricted-cubic-splines-sas.html

 

Here I use the code written by Rick to get cubic spline effect/term , if you want cubic interpolation check the first one URL I marked bold.

data cars;
set sashelp.cars;
keep mpg_city weight;
run;
proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
   effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 50 75) );
   model mpg_city = spl / selection=none noint;  
quit;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 825 views
  • 10 likes
  • 4 in conversation