BookmarkSubscribeRSS Feed
BartG
Calcite | Level 5

Hello - I am trying to figure out what the functional form of a fitted spline effect is. I have it working for Truncated Power Function splines and for Natural Cubic splines in GLMSELECT. But in PHREG it works only for Truncated Power Function splines and I fail to understand why Natural Cubic splines don't work. 

 

Below I include selected code to illustrate what I try to do. A full program is attached, and includes code for GLMSELECT and PHREG with TPF.

 

First I simulate time to event data, with a covariate X that has a non-linear relation with the hazard:

 

 

data sim2;
  call streaminit(9823759);
  do n = 1 to 10000;
    x = rand('Uniform');
    h = (0.6 + 0.5 * cos(2 * Constant('Pi') * x))/12;
    ObsMonths = rand('Exponential') / h;
    if ObsMonths > 12 then do;
      Event = 0;
      ObsMonths = 12;
    end;
    else Event = 1;
    output;
  end;
run;

 

 

Next, I fit a Cox model with a Natural Cubic regression spline for X:

 

 

*** This model uses the NATURALCUBIC basis with 5 knots. From the output datasets KNOTS, SPLINES 
    and ESTIMATES, the model formula is derived and checked against the predicted values in output 
    dataset MODELPREDICT;
proc phreg data=sim2;
  effect spl = spline(x /naturalcubic details knotmethod=percentilelist(5 27.5 50 72.5 95));
  model ObsMonths * Event(0) = spl;
  output out=ModelPredict xbeta=xbeta;
  ods output SplineKnots=knots TPFsplineDetails=splines ParameterEstimates=Estimates;
run;

 

 

The output that I get is:

 

Output.gif

 

My understanding is that XBETA in the output dataset is equal to the sum-product of the parameter estimates shown above times a natural-cubic basis function evaluated in the X value for the observation. The formula is shown below. NC is a function that is defined in the attached program.

 

 

0*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 1, x) 
+ -1.025583892*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 2, x) 
+ -35.97951988*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 3, x) 
+ 112.92039008*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 4, x) 
+ -113.9204329*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 5, x)

 

 

As already stated above, this approach works for Truncated Power Function splines and for Natural Cubic splines in GLMSELECT. It also works for Truncated Power Function splines in PHREG. But it does not work for Natural Cubic splines in PHREG and don't see why not 😞

 

Any suggestions?

 

Do check out the complete program that is attached.

 

Thanks,

Bart

9 REPLIES 9
sbxkoenk
SAS Super FREQ

Hello Bart,

 

I didn't have a detailed look into your question 😟 

, but I remember an issue I had when "mimicking" the Cubic Splines model within PROC ICPHREG.

I wanted to calculate the Baseline Cumulative Hazard function "manually" using the "basis functions" formula that was in the doc.

It was not clear in the doc (at the time) that the knot-values returned by the procedure should be log-scaled before entering them in the formula. After this log transform ( log(SAS_knot) ) the survival calculated by the procedure was exactly equal to my "manually calculated" survival.

The doc-people said they would make it more clear in the doc (I think that has been done meanwhile, did not check though ...).

 

Maybe you are confronted with something similar?

 

Cheers,

Koen

BartG
Calcite | Level 5
Thanks Koen - Your interest was in the baseline hazard function, where a log-transform makes sense. My interest is in the XBETA where a log-transform of a covariate is less obvious. Anyway, I did try and it doesn't solve my problem.
Rick_SAS
SAS Super FREQ

It looks like you use BASIS=TPF in most of the program, but you omitted that option in the last PHREG step. Without the option, you will be using the default B-spline basis.

BartG
Calcite | Level 5

Thanks for your response, Rick, but that should not be the problem. The Shared concepts documentation on the EFFECT statement states that NATURALCUBIC always uses the TPF basis and that the BASIS= option is not applicable. Indeed, adding BASIS=TPF does not change the output.

MichaelL_SAS
SAS Employee

I think the issue is in your NC function definition. The INPUT function call used to extract the knot values from the list is using a 3.0 format, i.e. 

 

      knot = input(scan(knots, j, ' '), 3.0);
      kmax = input(scan(knots, -1, ' '), 3.0);
      kmax1 = input(scan(knots, -2, ' '), 3.0); 

For your GLMSELECT example where the range of the X values is larger, that format looks to work okay, but for your PHREG example where the covariates are all between 0 and 1, the 3.0 format is probably giving you knot values that are not precise enough, which throws off the evaluation of the spline basis functions, and everything breaks down from there.

 

To check your NC function evaluations, you can output those values and compare them to the basis function evaluations in an OUTDESIGN= data set created like in this blog post of Rick's.

 

 

BartG
Calcite | Level 5

That's a great catch - I noticed it myself when I was replying to the first response above. Attached is an update to the  program, the only change is what is shown below. It does however not fix the problem - still unable to reconstruct the NATURALCUBIC splines ... 

 

      knot = input(scan(knots, j, ' '), best20.);
      kmax = input(scan(knots, -1, ' '), best20.);
      kmax1 = input(scan(knots, -2, ' '), best20.);

 

Rick_SAS
SAS Super FREQ

Could you share with us what you are trying to accomplish? What is the purpose of this exercise?

BartG
Calcite | Level 5

I want to plot the fitted splines from PHREG (with SGPLOT because of quality and flexibility). I manage to do it for a single spline parameter by exporting XBETA and subtracting the "normal" effects. But when there are multiple spline parameters it seems necessary to understand the parameterization.

Rick_SAS
SAS Super FREQ

Spline effects are merely columns in a design matrix. So for one spline effect, you can visualize the spline effects by generating the design matrix and plotting the relevant columns versus the underlying variable. See
"Visualize a regression with splines."

 

If you have multiple spline effects and the model includes interactions, then it is more complicated. For interactions between a categorical variable and a spline effect, you can visualize the interaction in a natural way. See "Interactions with spline effects in regression models."
If you have interactions between continuous variables and spline effects (or between multiple spline effects), the visualization is more complicated because a pairwise interaction term is a function of two variables.

 

I suggest you use the OUTDESIGN= option on the PROC GLMSELECT to generate the design matrix for the model that includes spline effects and interactions. You can then use the parameter estimates (via PROC IML or PROC SCORE) to obtain the linear predictor (XBETA).

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1053 views
  • 7 likes
  • 4 in conversation