PHREG natural cubic spline - what is the fitted function?

BartG · Posted 01-24-2023 09:57 AM

Hello - I am trying to figure out what the functional form of a fitted spline effect is. I have it working for Truncated Power Function splines and for Natural Cubic splines in GLMSELECT. But in PHREG it works only for Truncated Power Function splines and I fail to understand why Natural Cubic splines don't work.

Below I include selected code to illustrate what I try to do. A full program is attached, and includes code for GLMSELECT and PHREG with TPF.

First I simulate time to event data, with a covariate X that has a non-linear relation with the hazard:

data sim2;
  call streaminit(9823759);
  do n = 1 to 10000;
    x = rand('Uniform');
    h = (0.6 + 0.5 * cos(2 * Constant('Pi') * x))/12;
    ObsMonths = rand('Exponential') / h;
    if ObsMonths > 12 then do;
      Event = 0;
      ObsMonths = 12;
    end;
    else Event = 1;
    output;
  end;
run;

Next, I fit a Cox model with a Natural Cubic regression spline for X:

*** This model uses the NATURALCUBIC basis with 5 knots. From the output datasets KNOTS, SPLINES 
    and ESTIMATES, the model formula is derived and checked against the predicted values in output 
    dataset MODELPREDICT;
proc phreg data=sim2;
  effect spl = spline(x /naturalcubic details knotmethod=percentilelist(5 27.5 50 72.5 95));
  model ObsMonths * Event(0) = spl;
  output out=ModelPredict xbeta=xbeta;
  ods output SplineKnots=knots TPFsplineDetails=splines ParameterEstimates=Estimates;
run;

The output that I get is:

My understanding is that XBETA in the output dataset is equal to the sum-product of the parameter estimates shown above times a natural-cubic basis function evaluated in the X value for the observation. The formula is shown below. NC is a function that is defined in the attached program.

0*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 1, x) 
+ -1.025583892*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 2, x) 
+ -35.97951988*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 3, x) 
+ 112.92039008*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 4, x) 
+ -113.9204329*nc("0.046248 0.275256 0.501562 0.726796 0.950887", 5, x)

As already stated above, this approach works for Truncated Power Function splines and for Natural Cubic splines in GLMSELECT. It also works for Truncated Power Function splines in PHREG. But it does not work for Natural Cubic splines in PHREG and don't see why not 😞

Any suggestions?

Do check out the complete program that is attached.

Thanks,

Bart

sbxkoenk · Posted 01-24-2023 03:23 PM

Hello Bart,

I didn't have a detailed look into your question 😟

, but I remember an issue I had when "mimicking" the Cubic Splines model within PROC ICPHREG.

I wanted to calculate the Baseline Cumulative Hazard function "manually" using the "basis functions" formula that was in the doc.

It was not clear in the doc (at the time) that the knot-values returned by the procedure should be log-scaled before entering them in the formula. After this log transform ( log(SAS_knot) ) the survival calculated by the procedure was exactly equal to my "manually calculated" survival.

The doc-people said they would make it more clear in the doc (I think that has been done meanwhile, did not check though ...).

Maybe you are confronted with something similar?

Cheers,

Koen

BartG · Posted 01-25-2023 05:37 AM

Thanks Koen - Your interest was in the baseline hazard function, where a log-transform makes sense. My interest is in the XBETA where a log-transform of a covariate is less obvious. Anyway, I did try and it doesn't solve my problem.

Rick_SAS · Posted 01-24-2023 03:37 PM

It looks like you use BASIS=TPF in most of the program, but you omitted that option in the last PHREG step. Without the option, you will be using the default B-spline basis.

BartG · Posted 01-25-2023 04:52 AM

Thanks for your response, Rick, but that should not be the problem. The Shared concepts documentation on the EFFECT statement states that NATURALCUBIC always uses the TPF basis and that the BASIS= option is not applicable. Indeed, adding BASIS=TPF does not change the output.

MichaelL_SAS · Posted 01-26-2023 11:50 AM

I think the issue is in your NC function definition. The INPUT function call used to extract the knot values from the list is using a 3.0 format, i.e.

      knot = input(scan(knots, j, ' '), 3.0);
      kmax = input(scan(knots, -1, ' '), 3.0);
      kmax1 = input(scan(knots, -2, ' '), 3.0);

For your GLMSELECT example where the range of the X values is larger, that format looks to work okay, but for your PHREG example where the covariates are all between 0 and 1, the 3.0 format is probably giving you knot values that are not precise enough, which throws off the evaluation of the spline basis functions, and everything breaks down from there.

To check your NC function evaluations, you can output those values and compare them to the basis function evaluations in an OUTDESIGN= data set created like in this blog post of Rick's.

BartG · Posted 01-27-2023 08:19 AM

That's a great catch - I noticed it myself when I was replying to the first response above. Attached is an update to the program, the only change is what is shown below. It does however not fix the problem - still unable to reconstruct the NATURALCUBIC splines ...

      knot = input(scan(knots, j, ' '), best20.);
      kmax = input(scan(knots, -1, ' '), best20.);
      kmax1 = input(scan(knots, -2, ' '), best20.);

Rick_SAS · Posted 01-27-2023 08:35 AM

Could you share with us what you are trying to accomplish? What is the purpose of this exercise?

BartG · Posted 01-27-2023 09:56 AM

I want to plot the fitted splines from PHREG (with SGPLOT because of quality and flexibility). I manage to do it for a single spline parameter by exporting XBETA and subtracting the "normal" effects. But when there are multiple spline parameters it seems necessary to understand the parameterization.

Rick_SAS · Posted 01-27-2023 10:42 AM

Spline effects are merely columns in a design matrix. So for one spline effect, you can visualize the spline effects by generating the design matrix and plotting the relevant columns versus the underlying variable. See
"Visualize a regression with splines."

If you have multiple spline effects and the model includes interactions, then it is more complicated. For interactions between a categorical variable and a spline effect, you can visualize the interaction in a natural way. See "Interactions with spline effects in regression models."
If you have interactions between continuous variables and spline effects (or between multiple spline effects), the visualization is more complicated because a pairwise interaction term is a function of two variables.

I suggest you use the OUTDESIGN= option on the PROC GLMSELECT to generate the design matrix for the model that includes spline effects and interactions. You can then use the parameter estimates (via PROC IML or PROC SCORE) to obtain the linear predictor (XBETA).

PHREG natural cubic spline - what is the fitted function?

Re: PHREG natural cubic spline - what is the fitted function?

Re: PHREG natural cubic spline - what is the fitted function?

Re: PHREG natural cubic spline - what is the fitted function?

Re: PHREG natural cubic spline - what is the fitted function?

Re: PHREG natural cubic spline - what is the fitted function?

Re: PHREG natural cubic spline - what is the fitted function?

Re: PHREG natural cubic spline - what is the fitted function?

Re: PHREG natural cubic spline - what is the fitted function?

Re: PHREG natural cubic spline - what is the fitted function?