SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SeaMoon_168
Quartz | Level 8

As the title says, could you help me figure out how to convert a continuous variable age as cubic splines (25th, 75th percentile)? Many thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Check @Rick_SAS blogs for this topic:

https://blogs.sas.com/content/iml/2020/05/11/cubic-interpolation-sas.html
https://blogs.sas.com/content/iml/2017/04/19/restricted-cubic-splines-sas.html
https://blogs.sas.com/content/iml/2019/10/16/visualize-regression-splines.html
https://blogs.sas.com/content/iml/2024/06/03/vize-multivar-regression-splines.html
https://blogs.sas.com/content/iml/2019/02/18/regression-restricted-cubic-splines-sas.html

 

Here I use the code written by Rick to get cubic spline effect/term , if you want cubic interpolation check the first one URL I marked bold.

data cars;
set sashelp.cars;
keep mpg_city weight;
run;
proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
   effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 50 75) );
   model mpg_city = spl / selection=none noint;  
quit;

View solution in original post

15 REPLIES 15
PaigeMiller
Diamond | Level 26

Please describe the problem in more detail.

--
Paige Miller
SeaMoon_168
Quartz | Level 8

Many thanks for both your help. Here is the effect statement I used to create the spline age.

effect spl = spline(age / naturalcubic basis=BSPLINE knotmethod=percentilelist(25 75) );

I wonder if I can just use the data with the spline variable age (or weight) created by R instead of using the effect statement and then run the model using SAS. However, the results of the coefficients and the confidence intervals look different using SAS and R. 

 

Also, I wonder if the spline age has two categories?

spline.png

 

 

Rick_SAS
SAS Super FREQ

@SeaMoon_168 wrote:

I wonder if I can just use the data with the spline variable age (or weight) created by R and then run the model instead of using the effect statement.

However, the results of the coefficients and the confidence intervals look different. 

 

Also, I wonder if the spline age has two categories?

spline.png


Yes, in theory you could do that. However, I don't think you can use the NATURALCUBIC option with the SPLINE=BSPLINE option. Your log probably has a warning alerting you to the issue. I don't know whether that is the reason behind your strange output table. Please post the COMPLETE code you submitted and the COMPLETE ParameterEstimates table.

SeaMoon_168
Quartz | Level 8

Thank you for your help. Here is the SAS code. The cubic splines variables: weight and horsepower have percentiles 25% and 75% with degree=3.

proc print data=sashelp.cars(obs=10); run;
data cars;
set sashelp.cars;
keep mpg_city weight Horsepower;
run;
proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
   effect spl = spline(weight Horsepower/ naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 75) degree=3);
   model mpg_city = spl / selection=none noint;  
quit;

The result is shown below. However, the spline variables: weight and horsepower only have one coefficient? Should they have four coefficients for each variables as they are cubic splines (X^3, X^2, X, and constant)? How to control the knots and how many knots are required for this analysis? Could you help me figure it out? Many thanks!

spline_new.png

 

Rick_SAS
SAS Super FREQ

The LOG for your code states:

NOTE: Natural cubic splines with fewer than 3 knots reduce to linear polynomials.

which is why you only have one variable (and coefficient) for each spline basis. Remove the NATURALCUBIC option to get SIX (not four) basis elements.

 

The doc for the TPF option on the EFFECT statement states, "For splines of degree d defined with n knots for a variable x, this basis consists of an intercept, polynomials x, x^2, ..., x^d, and one truncated power function for each of the n knots."  In your example, d=3 and n=2 and you used the NOTIN option, so there are 3*2=6 basis columns in the design matrix that you specified (after you remove the NATURALCUBIC option).

SeaMoon_168
Quartz | Level 8

I want to create cubic spline terms for a continuous variable such as age, using the 25th and 75th percentiles as knots. For R, I could use the code

library(splines)
age_25 <- quantile(data$age, 0.25)
age_75 <- quantile(data$age, 0.75)
# Generate cubic spline terms
data$age_spline <- bs(data$age, knots = c(age_25, age_75), degree = 3)

Please let me know if further details are needed. Many thanks

 

 

Rick_SAS
SAS Super FREQ

It's always difficult to match results from different software because each software uses different defaults. For splines, there is the question of how to place the internal and external (boundary) knots. Also, be aware that SAS and R use different defaults for quantiles.

 

In SAS, you can use the EFFECT statement in many regression procedures to use a B-spline basis. You can output the spline basis, but that is usually not necessary.  For linear models, use PROC GLMSELECT with the SELECTION=NONE option on the model statement. See the blog posts that KSharp listed, but use the BASIS=BSPLINE option.

 

In SAS IML, you can use the BSPLINE function to get the basis of B-splines. This is closest to your R code.

data work.cars;
set sashelp.cars(obs=50);
where weight^=.;
keep mpg_city weight;
run;

proc iml;
/* compute the B-spline basis in R */
call exportdatasettoR("work.cars", "cars");
submit/R;
library(splines)
cars
wt_25 <- quantile(cars$Weight, 0.25)
wt_75 <- quantile(cars$Weight, 0.75)
wt_25
wt_75
age_spline <- bs(cars$Weight, knots = c(wt_25, wt_75), degree = 3)
age_spline   # generate degree 3 (cubic) B-splines with 2 internal knots
attr(age_spline,"knots")
attr(age_spline,"Boundary.knots")
endsubmit;

/* compute it in IML by using BSPLINE function */
use cars;
read all var "weight";
close;
q = {3351.5 , 3925.25 };   
deg = 3;
knots = repeat(min(weight), deg)    /* use deg boundary knots */
        // q // 
        repeat(max(weight), deg);
print knots;
wt_spline = bspline(weight, deg, knots); 
/* the R bs() function does not include the first column */
wt_spline = wt_spline[,2:ncol(wt_spline)];

create bspline from wt_spline;
append from wt_spline;
close;
QUIT;

 

 

If you want to perform regression, use PROC GLMSELECT or another regression procedure. 

SeaMoon_168
Quartz | Level 8

I wonder if I can just use the data with the spline variable age (or weight) created by R and then run the model instead of using the effect statement. However, the results of the coefficients and the confidence intervals look different using R and SAS effect statement.

Ksharp
Super User

The first URL (  cubic interpolation  )  I gave you  would get ONE variable.

But if using EFFECT statement you would get several variabels for cubic spline effect.

And you would notice the following in log  if you only use TWO percentile as knot:

5    proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
NOTE: 正在写入 HTML Body(主体)文件: sashtml.htm
6       effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 75) );
7       model mpg_city = spl / selection=none noint;
8    quit;

NOTE: Natural cubic splines with fewer than 3 knots reduce to linear polynomials.
NOTE: 从数据集 WORK.CARS. 读取了 428 个观测
NOTE: 数据集 WORK.SPLINEBASIS 有 428 个观测和 3 个变量。
NOTE: “PROCEDURE GLMSELECT”所用时间(总处理时间):
      实际时间          0.71 秒
      CPU 时间          0.11 秒

Ksharp_1-1735973228144.png

 

 

 

 

 

 

 

Therefore ,you need more knots to generate the cubic spline effect:

13   proc glmselect data=cars outdesign(addinputvars fullmodel)=SplineBasis;
14      effect spl = spline(weight / naturalcubic basis=tpf(noint) knotmethod=percentilelist(5 25 50 75 95) );
15      model mpg_city = spl / selection=none noint;
16   quit;

NOTE: 从数据集 WORK.CARS. 读取了 428 个观测
NOTE: 数据集 WORK.SPLINEBASIS 有 428 个观测和 6 个变量。
NOTE: “PROCEDURE GLMSELECT”所用时间(总处理时间):
      实际时间          0.09 秒
      CPU 时间          0.01 秒

And you would yield FOUR variables to represent cubic spline effect:

Ksharp_0-1735973138115.png

 

 

 

Or using Rick suggested "basis=bspline".

Ksharp_0-1735973799450.png

Or this cubic spline:

Ksharp_1-1735973894357.png

 

TomHsiung
Pyrite | Level 9

Can we have an equivalent output in PROC PHREG? Thanks.

Rick_SAS
SAS Super FREQ

Yes. PROC PHREG supports the EFFECT statement, so the syntax for PROC PHREG is exactly the same. For example, here is the syntax for a model that uses data in a documentation example at SAS Help Center: Stepwise Regression:

proc phreg data=Myeloma;
   effect spl = spline(LogBUN / naturalcubic basis=tpf(noint) knotmethod=percentilelist(25 50 75) );
   model Time*VStatus(0) = spl HGB;
run;
TomHsiung
Pyrite | Level 9
Hello, Rick

Thank you for your respond. We intend to plot the spline (HR vs a continuous variable). Not sure about the strategy but Stata can output each observation with the spline data and plot the graph. So, how about in SAS Viya by code? Thanks.
Rick_SAS
SAS Super FREQ

Yes. Examples are in some of the links in KSharp's accepted solution. 

For example, if you want to see the spline functions, look at Visualize a regression with splines - The DO Loop

If you just want the predicted values, you can output them directly by using the OUTPUT statement.

 

If this doesn't answer your questions, you might want to start a new thread and post your code and specify the graph you want to create.

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 1803 views
  • 11 likes
  • 5 in conversation