- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As the title says, could you help me figure out how to convert a continuous variable age as cubic splines (25th, 75th percentile)? Many thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Check @Rick_SAS blogs for this topic:
https://blogs.sas.com/content/iml/2020/05/11/cubic-interpolation-sas.html
https://blogs.sas.com/content/iml/2017/04/19/restricted-cubic-splines-sas.html
https://blogs.sas.com/content/iml/2019/10/16/visualize-regression-splines.html
https://blogs.sas.com/content/iml/2024/06/03/vize-multivar-regression-splines.html
https://blogs.sas.com/content/iml/2019/02/18/regression-restricted-cubic-splines-sas.html
Here I use the code written by Rick to get cubic spline effect/term , if you want cubic interpolation check the first one URL I marked bold.
data cars; set sashelp.cars; keep mpg_city weight; run; proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis; effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 50 75) ); model mpg_city = spl / selection=none noint; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please describe the problem in more detail.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks for both your help. Here is the effect statement I used to create the spline age.
effect spl = spline(age / naturalcubic basis=BSPLINE knotmethod=percentilelist(25 75) );
I wonder if I can just use the data with the spline variable age (or weight) created by R instead of using the effect statement and then run the model using SAS. However, the results of the coefficients and the confidence intervals look different using SAS and R.
Also, I wonder if the spline age has two categories?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SeaMoon_168 wrote:
I wonder if I can just use the data with the spline variable age (or weight) created by R and then run the model instead of using the effect statement.
However, the results of the coefficients and the confidence intervals look different.
Also, I wonder if the spline age has two categories?
Yes, in theory you could do that. However, I don't think you can use the NATURALCUBIC option with the SPLINE=BSPLINE option. Your log probably has a warning alerting you to the issue. I don't know whether that is the reason behind your strange output table. Please post the COMPLETE code you submitted and the COMPLETE ParameterEstimates table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your help. Here is the SAS code. The cubic splines variables: weight and horsepower have percentiles 25% and 75% with degree=3.
proc print data=sashelp.cars(obs=10); run;
data cars;
set sashelp.cars;
keep mpg_city weight Horsepower;
run;
proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
effect spl = spline(weight Horsepower/ naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 75) degree=3);
model mpg_city = spl / selection=none noint;
quit;
The result is shown below. However, the spline variables: weight and horsepower only have one coefficient? Should they have four coefficients for each variables as they are cubic splines (X^3, X^2, X, and constant)? How to control the knots and how many knots are required for this analysis? Could you help me figure it out? Many thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The LOG for your code states:
NOTE: Natural cubic splines with fewer than 3 knots reduce to linear polynomials.
which is why you only have one variable (and coefficient) for each spline basis. Remove the NATURALCUBIC option to get SIX (not four) basis elements.
The doc for the TPF option on the EFFECT statement states, "For splines of degree d defined with n knots for a variable x, this basis consists of an intercept, polynomials x, x^2, ..., x^d, and one truncated power function for each of the n knots." In your example, d=3 and n=2 and you used the NOTIN option, so there are 3*2=6 basis columns in the design matrix that you specified (after you remove the NATURALCUBIC option).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want to create cubic spline terms for a continuous variable such as age, using the 25th and 75th percentiles as knots. For R, I could use the code
library(splines)
age_25 <- quantile(data$age, 0.25)
age_75 <- quantile(data$age, 0.75)
# Generate cubic spline terms
data$age_spline <- bs(data$age, knots = c(age_25, age_75), degree = 3)
Please let me know if further details are needed. Many thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's always difficult to match results from different software because each software uses different defaults. For splines, there is the question of how to place the internal and external (boundary) knots. Also, be aware that SAS and R use different defaults for quantiles.
In SAS, you can use the EFFECT statement in many regression procedures to use a B-spline basis. You can output the spline basis, but that is usually not necessary. For linear models, use PROC GLMSELECT with the SELECTION=NONE option on the model statement. See the blog posts that KSharp listed, but use the BASIS=BSPLINE option.
In SAS IML, you can use the BSPLINE function to get the basis of B-splines. This is closest to your R code.
data work.cars;
set sashelp.cars(obs=50);
where weight^=.;
keep mpg_city weight;
run;
proc iml;
/* compute the B-spline basis in R */
call exportdatasettoR("work.cars", "cars");
submit/R;
library(splines)
cars
wt_25 <- quantile(cars$Weight, 0.25)
wt_75 <- quantile(cars$Weight, 0.75)
wt_25
wt_75
age_spline <- bs(cars$Weight, knots = c(wt_25, wt_75), degree = 3)
age_spline # generate degree 3 (cubic) B-splines with 2 internal knots
attr(age_spline,"knots")
attr(age_spline,"Boundary.knots")
endsubmit;
/* compute it in IML by using BSPLINE function */
use cars;
read all var "weight";
close;
q = {3351.5 , 3925.25 };
deg = 3;
knots = repeat(min(weight), deg) /* use deg boundary knots */
// q //
repeat(max(weight), deg);
print knots;
wt_spline = bspline(weight, deg, knots);
/* the R bs() function does not include the first column */
wt_spline = wt_spline[,2:ncol(wt_spline)];
create bspline from wt_spline;
append from wt_spline;
close;
QUIT;
If you want to perform regression, use PROC GLMSELECT or another regression procedure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I wonder if I can just use the data with the spline variable age (or weight) created by R and then run the model instead of using the effect statement. However, the results of the coefficients and the confidence intervals look different using R and SAS effect statement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The first URL ( cubic interpolation ) I gave you would get ONE variable.
But if using EFFECT statement you would get several variabels for cubic spline effect.
And you would notice the following in log if you only use TWO percentile as knot:
5 proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis; NOTE: 正在写入 HTML Body(主体)文件: sashtml.htm 6 effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 75) ); 7 model mpg_city = spl / selection=none noint; 8 quit; NOTE: Natural cubic splines with fewer than 3 knots reduce to linear polynomials. NOTE: 从数据集 WORK.CARS. 读取了 428 个观测 NOTE: 数据集 WORK.SPLINEBASIS 有 428 个观测和 3 个变量。 NOTE: “PROCEDURE GLMSELECT”所用时间(总处理时间): 实际时间 0.71 秒 CPU 时间 0.11 秒
Therefore ,you need more knots to generate the cubic spline effect:
13 proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis; 14 effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 25 50 75 95) ); 15 model mpg_city = spl / selection=none noint; 16 quit; NOTE: 从数据集 WORK.CARS. 读取了 428 个观测 NOTE: 数据集 WORK.SPLINEBASIS 有 428 个观测和 6 个变量。 NOTE: “PROCEDURE GLMSELECT”所用时间(总处理时间): 实际时间 0.09 秒 CPU 时间 0.01 秒
And you would yield FOUR variables to represent cubic spline effect:
Or using Rick suggested "basis=bspline".
Or this cubic spline:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can we have an equivalent output in PROC PHREG? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes. PROC PHREG supports the EFFECT statement, so the syntax for PROC PHREG is exactly the same. For example, here is the syntax for a model that uses data in a documentation example at SAS Help Center: Stepwise Regression:
proc phreg data=Myeloma;
effect spl = spline(LogBUN / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 50 75) );
model Time*VStatus(0) = spl HGB;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your respond. We intend to plot the spline (HR vs a continuous variable). Not sure about the strategy but Stata can output each observation with the spline data and plot the graph. So, how about in SAS Viya by code? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes. Examples are in some of the links in KSharp's accepted solution.
For example, if you want to see the spline functions, look at Visualize a regression with splines - The DO Loop
If you just want the predicted values, you can output them directly by using the OUTPUT statement.
If this doesn't answer your questions, you might want to start a new thread and post your code and specify the graph you want to create.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content