As the title says, could you help me figure out how to convert a continuous variable age as cubic splines (25th, 75th percentile)? Many thanks
Check @Rick_SAS blogs for this topic:
https://blogs.sas.com/content/iml/2020/05/11/cubic-interpolation-sas.html
https://blogs.sas.com/content/iml/2017/04/19/restricted-cubic-splines-sas.html
https://blogs.sas.com/content/iml/2019/10/16/visualize-regression-splines.html
https://blogs.sas.com/content/iml/2024/06/03/vize-multivar-regression-splines.html
https://blogs.sas.com/content/iml/2019/02/18/regression-restricted-cubic-splines-sas.html
Here I use the code written by Rick to get cubic spline effect/term , if you want cubic interpolation check the first one URL I marked bold.
data cars; set sashelp.cars; keep mpg_city weight; run; proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis; effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 50 75) ); model mpg_city = spl / selection=none noint; quit;
Please describe the problem in more detail.
Many thanks for both your help. Here is the effect statement I used to create the spline age.
effect spl = spline(age / naturalcubic basis=BSPLINE knotmethod=percentilelist(25 75) );
I wonder if I can just use the data with the spline variable age (or weight) created by R instead of using the effect statement and then run the model using SAS. However, the results of the coefficients and the confidence intervals look different using SAS and R.
Also, I wonder if the spline age has two categories?
@SeaMoon_168 wrote:
I wonder if I can just use the data with the spline variable age (or weight) created by R and then run the model instead of using the effect statement.
However, the results of the coefficients and the confidence intervals look different.
Also, I wonder if the spline age has two categories?
Yes, in theory you could do that. However, I don't think you can use the NATURALCUBIC option with the SPLINE=BSPLINE option. Your log probably has a warning alerting you to the issue. I don't know whether that is the reason behind your strange output table. Please post the COMPLETE code you submitted and the COMPLETE ParameterEstimates table.
Thank you for your help. Here is the SAS code. The cubic splines variables: weight and horsepower have percentiles 25% and 75% with degree=3.
proc print data=sashelp.cars(obs=10); run;
data cars;
set sashelp.cars;
keep mpg_city weight Horsepower;
run;
proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
effect spl = spline(weight Horsepower/ naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 75) degree=3);
model mpg_city = spl / selection=none noint;
quit;
The result is shown below. However, the spline variables: weight and horsepower only have one coefficient? Should they have four coefficients for each variables as they are cubic splines (X^3, X^2, X, and constant)? How to control the knots and how many knots are required for this analysis? Could you help me figure it out? Many thanks!
The LOG for your code states:
NOTE: Natural cubic splines with fewer than 3 knots reduce to linear polynomials.
which is why you only have one variable (and coefficient) for each spline basis. Remove the NATURALCUBIC option to get SIX (not four) basis elements.
The doc for the TPF option on the EFFECT statement states, "For splines of degree d defined with n knots for a variable x, this basis consists of an intercept, polynomials x, x^2, ..., x^d, and one truncated power function for each of the n knots." In your example, d=3 and n=2 and you used the NOTIN option, so there are 3*2=6 basis columns in the design matrix that you specified (after you remove the NATURALCUBIC option).
I want to create cubic spline terms for a continuous variable such as age, using the 25th and 75th percentiles as knots. For R, I could use the code
library(splines)
age_25 <- quantile(data$age, 0.25)
age_75 <- quantile(data$age, 0.75)
# Generate cubic spline terms
data$age_spline <- bs(data$age, knots = c(age_25, age_75), degree = 3)
Please let me know if further details are needed. Many thanks
It's always difficult to match results from different software because each software uses different defaults. For splines, there is the question of how to place the internal and external (boundary) knots. Also, be aware that SAS and R use different defaults for quantiles.
In SAS, you can use the EFFECT statement in many regression procedures to use a B-spline basis. You can output the spline basis, but that is usually not necessary. For linear models, use PROC GLMSELECT with the SELECTION=NONE option on the model statement. See the blog posts that KSharp listed, but use the BASIS=BSPLINE option.
In SAS IML, you can use the BSPLINE function to get the basis of B-splines. This is closest to your R code.
data work.cars;
set sashelp.cars(obs=50);
where weight^=.;
keep mpg_city weight;
run;
proc iml;
/* compute the B-spline basis in R */
call exportdatasettoR("work.cars", "cars");
submit/R;
library(splines)
cars
wt_25 <- quantile(cars$Weight, 0.25)
wt_75 <- quantile(cars$Weight, 0.75)
wt_25
wt_75
age_spline <- bs(cars$Weight, knots = c(wt_25, wt_75), degree = 3)
age_spline # generate degree 3 (cubic) B-splines with 2 internal knots
attr(age_spline,"knots")
attr(age_spline,"Boundary.knots")
endsubmit;
/* compute it in IML by using BSPLINE function */
use cars;
read all var "weight";
close;
q = {3351.5 , 3925.25 };
deg = 3;
knots = repeat(min(weight), deg) /* use deg boundary knots */
// q //
repeat(max(weight), deg);
print knots;
wt_spline = bspline(weight, deg, knots);
/* the R bs() function does not include the first column */
wt_spline = wt_spline[,2:ncol(wt_spline)];
create bspline from wt_spline;
append from wt_spline;
close;
QUIT;
If you want to perform regression, use PROC GLMSELECT or another regression procedure.
I wonder if I can just use the data with the spline variable age (or weight) created by R and then run the model instead of using the effect statement. However, the results of the coefficients and the confidence intervals look different using R and SAS effect statement.
The first URL ( cubic interpolation ) I gave you would get ONE variable.
But if using EFFECT statement you would get several variabels for cubic spline effect.
And you would notice the following in log if you only use TWO percentile as knot:
5 proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis; NOTE: 正在写入 HTML Body(主体)文件: sashtml.htm 6 effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 75) ); 7 model mpg_city = spl / selection=none noint; 8 quit; NOTE: Natural cubic splines with fewer than 3 knots reduce to linear polynomials. NOTE: 从数据集 WORK.CARS. 读取了 428 个观测 NOTE: 数据集 WORK.SPLINEBASIS 有 428 个观测和 3 个变量。 NOTE: “PROCEDURE GLMSELECT”所用时间(总处理时间): 实际时间 0.71 秒 CPU 时间 0.11 秒
Therefore ,you need more knots to generate the cubic spline effect:
13 proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis; 14 effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 25 50 75 95) ); 15 model mpg_city = spl / selection=none noint; 16 quit; NOTE: 从数据集 WORK.CARS. 读取了 428 个观测 NOTE: 数据集 WORK.SPLINEBASIS 有 428 个观测和 6 个变量。 NOTE: “PROCEDURE GLMSELECT”所用时间(总处理时间): 实际时间 0.09 秒 CPU 时间 0.01 秒
And you would yield FOUR variables to represent cubic spline effect:
Or using Rick suggested "basis=bspline".
Or this cubic spline:
Check @Rick_SAS blogs for this topic:
https://blogs.sas.com/content/iml/2020/05/11/cubic-interpolation-sas.html
https://blogs.sas.com/content/iml/2017/04/19/restricted-cubic-splines-sas.html
https://blogs.sas.com/content/iml/2019/10/16/visualize-regression-splines.html
https://blogs.sas.com/content/iml/2024/06/03/vize-multivar-regression-splines.html
https://blogs.sas.com/content/iml/2019/02/18/regression-restricted-cubic-splines-sas.html
Here I use the code written by Rick to get cubic spline effect/term , if you want cubic interpolation check the first one URL I marked bold.
data cars; set sashelp.cars; keep mpg_city weight; run; proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis; effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 50 75) ); model mpg_city = spl / selection=none noint; quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.