Solved: Re: How to Plot Spline Curve Using a Reference Value in PROC PHREG?

bhr-q · Posted 07-20-2025 06:43 PM

Hello All,

I would appreciate it if you could help me create the spline curve using the method below:

The way is: we estimate HR(x; xref) = exp[βf(x) − βf(xref)],
where f () is the function that is used to represent X using restricted cubic spline. I am not sure after fitting the spline model in PROC PHREG, How can I specify and retain a specific value (RV = 20) as the reference so I can calculate HR(x) = exp(xbeta − xbeta_ref) on Y-axis ?

The aim is to have HR(x;x_ref) on the y-axis against the independent variable on the X-axis.

I am not sure should I start with this way? if so, how can I continue this code?

proc phreg data=tmp;
effect spl = spline(RV / basis=tpf(noint) naturalcubic details);
model time_to_hfhosp * event(0) = spl;
output out=phout xbeta=xb;
run;

or should I go with the way below?

proc phreg data=tmp;
effect spl = spline(RV / basis=tpf(noint) naturalcubic details);
model time_to_hfhosp * event(0) = spl;
store out=SplineModel;
run;
data new;
do RV = 20 to 100 by 1;   output;  end;
run;
proc plm restore=SplineModel;
score data=new out=preds predicted;
run;
data relhr;
set preds;
retain eta_ref;
if RV_Pacing = 20 then eta_ref = Predicted;
HR_rel = exp(Predicted - eta_ref);
run;
proc sgplot data=relhr ; 
series x=RV y=HR_rel / lineattrs=(thickness=2);
refline 1 / axis=y lineattrs=(pattern=shortdash);
run;

Note:20 is min and 100 is max value of the independent covariate named RV, I wanted to consider min value (20) as reference value.

Thank you so much

StatDave · Posted 07-21-2025 01:01 PM

The spline is applied to the original variable, so an illustration of the spline is a plot of the spline values against the original variable. This is done by the PLOTS option in the GAMPL procedure which can be used to fit generalized linear models with splines. They are called component plots. For this survival model, you can do the same by saving the spline values using the XBETA= option in the OUTPUT statement and then plotting them against the original variable.

proc phreg data=Myeloma;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   hazardratio protein / at (protein=10) units=1 to 5;
   ods output hazardratios=hr;
   output out=out xbeta=splprot;
   run;
proc sort data=out; by protein; run;
proc sgplot data=out noautolegend;
pbspline y=splprot x=protein;
yaxis label="Spline(protein)";
run;

For a plot of the hazard ratios using the splined original variable, you can produce a smooth curve in a similar way, but again, the spline applies to the original variable as above and then this is just used as a predictor in the model.

data hr; 
do units=1 to 5; set hr; output; end;
run;
proc sgplot data=hr noautolegend;
   band upper=waldupper lower=waldlower x=units / transparency=.5;
   pbspline y=hazardratio x=units;
   xaxis label='Protein' grid; 
   yaxis label='Hazard Ratio' grid;
   refline 1 / axis=y;
   run;

View solution in original post

Ksharp · Posted 07-20-2025 09:47 PM

if you want to get HR between RV=20 and RV=80 ,you can use UNIT= option ,
which is HR(RV+60)/HR(RV) when RV=20:

hazardratio 'Effect of 60-unit change in RV across RV' RV / at(RV = (20)) units=60;

bhr-q · Posted 07-21-2025 08:34 AM

Thank you for your answer. Just how to create the spline curve using this way? to calculate HR(x; xreference) using exp(xbeta − xbeta_reference) across the range?

TomHsiung · Posted 07-21-2025 09:10 AM

He wants the spline, which usually is a non-linear graph with x being a continuous independent variable (e.g., weight, blood pressure, creatinine level, etc.) and the y being the hazard ratio and its 95%CI bands.

Here is from Gemini:

Restricted cubic splines (RCS) don't actually "split" the coefficient of a single variable into several sub-coefficients directly. Instead, they **transform a single continuous variable into a set of new, related variables (called basis functions), and it's these new variables that each get their own regression coefficient.** The original variable itself is no longer directly in the model in its raw form, or if it is, it's typically just the first basis function.

The math behind this lies in the construction of these basis functions. The idea is to allow for a flexible, nonlinear relationship with the outcome while ensuring the curve behaves nicely (is "restricted" to be linear) in the tails of the data.

Let's assume you have a continuous variable, $X$, and you want to model its effect nonlinearly using RCS.

... (See the PDF file for completed answer from Gemini)

StatDave · Posted 07-21-2025 11:19 AM

It sounds like what you want is a plot of the spline fit with the original variable on the horizontal axis, the vertical axis is the hazard ratio, and the points on the plot are the hazard ratios for changing the original variable from a reference value (like 20 as you mention) various amounts. That can be done as has already been shown using the HAZARDRATIO statement with the UNITS= option. You can then save the resulting table of hazard ratios and then use PROC SGPLOT to plot it. For example, using the myeloma data in an example in the PHREG documentation:

proc phreg data=Myeloma;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   hazardratio protein / at (protein=10) units=1 to 5;
   ods output hazardratios=hr;
   run;
proc sgplot noautolegend;
   highlow high=waldupper low=waldlower x=description / lowcap=serif highcap=serif;
   scatter y=hazardratio x=description;
   xaxis label='Protein' grid; 
   yaxis label='Hazard Ratio' grid;
   refline 1 / axis=y;
   run;

bhr-q · Posted 07-21-2025 12:28 PM

Thank you for your answer. I did try this way. it seems a bar-style plot, not a smooth spline curve.

with your sample data "Myeloma" it shows

I would appreciate if you have any thought to create a spline curve?

StatDave · Posted 07-21-2025 01:01 PM

The spline is applied to the original variable, so an illustration of the spline is a plot of the spline values against the original variable. This is done by the PLOTS option in the GAMPL procedure which can be used to fit generalized linear models with splines. They are called component plots. For this survival model, you can do the same by saving the spline values using the XBETA= option in the OUTPUT statement and then plotting them against the original variable.

proc phreg data=Myeloma;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   hazardratio protein / at (protein=10) units=1 to 5;
   ods output hazardratios=hr;
   output out=out xbeta=splprot;
   run;
proc sort data=out; by protein; run;
proc sgplot data=out noautolegend;
pbspline y=splprot x=protein;
yaxis label="Spline(protein)";
run;

For a plot of the hazard ratios using the splined original variable, you can produce a smooth curve in a similar way, but again, the spline applies to the original variable as above and then this is just used as a predictor in the model.

data hr; 
do units=1 to 5; set hr; output; end;
run;
proc sgplot data=hr noautolegend;
   band upper=waldupper lower=waldlower x=units / transparency=.5;
   pbspline y=hazardratio x=units;
   xaxis label='Protein' grid; 
   yaxis label='Hazard Ratio' grid;
   refline 1 / axis=y;
   run;

bhr-q · Posted 07-24-2025 09:31 AM

Thank you @StatDave for your answer,

I just noticed something wrong. When I take a proc print of hr, I noticed in the output table, the HR for protein Unit=10 At protein=10 is 0.246, but since 10 was set as the reference value in the hazard ratio statement (at(protein=10)), I expected the HR should be 1.

units	Description	HazardRatio
1	protein Unit=1 At protein=10	1.063
2	protein Unit=2 At protein=10	1.051
3	protein Unit=3 At protein=10	0.855
4	protein Unit=4 At protein=10	0.61
5	protein Unit=5 At protein=10	0.496
6	protein Unit=6 At protein=10	0.431
7	protein Unit=7 At protein=10	0.374
8	protein Unit=8 At protein=10	0.325
9	protein Unit=9 At protein=10	0.283
10	protein Unit=10 At protein=10	0.246
11	protein Unit=11 At protein=10	0.213
12	protein Unit=12 At protein=10	0.185
13	protein Unit=13 At protein=10	0.161
14	protein Unit=14 At protein=10	0.14

proc phreg data=Myeloma;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   hazardratio protein / at (protein=10) units=1 to 27;
   ods output hazardratios=hr;
   output out=out xbeta=splprot;
   run;
data hr; 
do units=1 to 27; set hr; output; end;
run;
proc print data=hr;
run;
proc sgplot data=hr noautolegend;
   band upper=waldupper lower=waldlower x=units / transparency=.5;
   pbspline y=hazardratio x=units;
   xaxis label='Protein' ; 
   yaxis label='Hazard Ratio' ;
   refline 1 / axis=y;
   run;

I was wondering if you could have any thoughts on this?

Thank you,

StatDave · Posted 07-24-2025 10:43 AM

The hazard ratios computed in each case is the change in hazard at protein=10 compared to the hazard at various-sized increases above 10 which is indicated by the UNITS. So, the value labeled "protein Unit=10 At protein=10" is the change in hazards when increasing protein from 10 to 20.

bhr-q · Posted 07-24-2025 11:55 AM

Thank you for your answer. However, I wanted to clarify my goal in my initial comment. It was to plot the HR at each value of the continuous variable (protein) relative to a fixed reference value; for example, HR(x; x₀) = exp[βf(x) − βf(x₀)], where f() is the spline transformation of the variable.

I think this explanation refers to getting the HR at each observed value of the continuous variable compared to a fixed reference (like protein = 10), rather than computing the change in hazard for specific unit increases starting from the reference value, as described in your explanation.

StatDave · Posted 07-24-2025 02:15 PM

If you want the hazard ratios comparing the predicted hazard from each observation with the predicted hazard when each observation is at a fixed reference value, then you can do that by saving the predicted log hazards using the XBETA= option in the OUTPUT statement for the original data and for data with all values set to the reference. Then compute the hazard ratios from the difference and plot them. For example

data ref; set Myeloma;
  protein=10; time=.;
  run;
data both; set Myeloma ref(in=inref); 
  ref=inref; 
  run;
proc phreg data=both;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   output out=out xbeta=xb;
   run;
data xb; set out; where ref=0; lhz=xb; keep lhz protein; run;
data xbref; set out; where ref=1; lhzref=xb; keep lhzref; run;
data hr; merge xb xbref; hr=exp(lhz-lhzref); run;
proc sort data=hr; by protein; run;
proc sgplot data=hr noautolegend;
  pbspline y=hr x=protein;
  refline 1 / axis=y;
  run;

TomHsiung

So far, I have to get a dataset of predicted values from Stata, then make the spline diagram by proc sgplot. This combined procedure makes sure the spline is continuous.

On the other hand, I learned the proc iml a little and by the matrix algebra, I can get the variance (actually the covariance-variance matrix) of the outcome variable. After extraction the variance from the diagonal line from the matrix, standard error for each observation is known. However, this approach requires Stata two, as I don't now how to transfer a continuous variable to a few spline variables in SAS. These new variables transferred are those taken to construct the regression model.

StatDave

If you want smooth confidence lines as well as the fitted line on the hazard ratios, you can slightly modify the code in my last post to compute the confidence limits and plot them as shown below. Note that your model is fitting a spline curve to the log hazards, not to the hazard ratios, so a smoothed curve on the hazard ratios doesn't come from the model. Hence the use of the PBSPLINE statement.

data ref; set Myeloma;
  protein=10; time=.;
  run;
data both; set Myeloma ref(in=inref); 
  ref=inref; 
  run;
proc phreg data=both;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   output out=out xbeta=xb stdxbeta=sexb;
   run;
data xb; set out; where ref=0; 
lhz=xb; keep lhz protein lhz sexb; 
run;
data xbref; set out; where ref=1; lhzref=xb; keep lhzref; run;
data hr; merge xb xbref; hr=exp(lhz-lhzref); 
lower=exp(lhz-lhzref-1.96*sexb); upper=exp(lhz-lhzref+1.96*sexb);
run;
proc sort data=hr; by protein; run;
proc sgplot data=hr noautolegend;
  pbspline y=upper x=protein /nomarkers;
  pbspline y=lower x=protein /nomarkers;
  pbspline y=hr x=protein;
  refline 1 / axis=y;
  yaxis label='HR';
  run;

TomHsiung

Thanks for the additional information and I will try your method later. Have a great day!

TomHsiung

Hello, pal

Thanks for your code about the spline in SAS. However, I found the result derived from the code in SAS is quite different from the result I got by Stata procedures, given the exactly same dataset. I have not yet identified the causes. In spite of this dispute, I would like to share my SAS code, which handles the parameters got by Stata spline regression and then do the prediction through matrix calculation.

The code use the covariance-variance matrix to compute the variance of the outcome variable. Additionally, the mean of the outcome variable could derive from the matrix calculation by regression coefficient matrix.

I hope this update will facilitate the handle of the data to make a restricted cubic spline. Have a great day!

Tom

/*Matrix calculation*/
proc iml;
use mx;
    read all var {bnp} into S;
close mx;
use mx;
    /* 1. Read all three variables into a single matrix */
    read all var {bnps1 bnps2 bnps3 interaction} into X;
close mx;
use cov;
    read all var {bnps1 bnps2 bnps3 interaction} into V;
close cov;
use coefficients;
    read all var {bnps1 bnps2 bnps3 interaction} into M;
close coefficients;
*print X;
*print V;
Pred = X * V * X`;
PV = vecdiag(Pred);
*print PV;
SE = sqrt(PV);
print SE;
Mean = X * M`;
print X;
print Mean;
Upper = Mean + SE*1.96;
Lower = Mean - SE*1.96;
HR = exp(Mean);
lo = exp(Lower);
hi = exp(Upper);
print HR lo hi;
/* Combine vectors into one matrix */
outmatrix = S || HR || lo || hi;
/* Extract the columns from a matrix */
/* a. The CREATE statement defines the new dataset and its column names. */
create HR_Results from outmatrix [colname={'bnp' 'hr' 'lower' 'upper'}];
/* b. The APPEND statement writes the data from the matrix into the new dataset. */
append from outmatrix;
quit;

The 2025 SAS Hackathon has begun!