BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bhr-q
Pyrite | Level 9

Hello All,

I would appreciate it if you could help me create the spline curve using the method below:  

 

The way is:   we estimate HR(x; xref) = exp[βf(x) − βf(xref)],
where f () is the function that is used to represent X using restricted cubic spline.  I am not sure after fitting the spline model in PROC PHREGHow can I specify and retain a specific value (RV = 20) as the reference so I can calculate HR(x) = exp(xbeta − xbeta_ref) on Y-axis ?

The aim is to have  HR(x;x_ref) on the y-axis against the independent variable on the X-axis.

 

I am not sure should I start with this way? if so, how can I continue this code?

proc phreg data=tmp;
effect spl = spline(RV / basis=tpf(noint) naturalcubic details);
model time_to_hfhosp * event(0) = spl;
output out=phout xbeta=xb;
run;

or should I go with the  way below?

proc phreg data=tmp;
effect spl = spline(RV / basis=tpf(noint) naturalcubic details);
model time_to_hfhosp * event(0) = spl;
store out=SplineModel;
run;
data new;
do RV = 20 to 100 by 1;   output;  end;
run;
proc plm restore=SplineModel;
score data=new out=preds predicted;
run;
data relhr;
set preds;
retain eta_ref;
if RV_Pacing = 20 then eta_ref = Predicted;
HR_rel = exp(Predicted - eta_ref);
run;
proc sgplot data=relhr ; 
series x=RV y=HR_rel / lineattrs=(thickness=2);
refline 1 / axis=y lineattrs=(pattern=shortdash);
run;

 

bhrq_0-1753050743788.png

 

 

Note:20 is min and 100 is max value of the independent covariate named RV, I wanted to consider min value (20) as reference value.

 

Thank you so much

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

The spline is applied to the original variable, so an illustration of the spline is a plot of the spline values against the original variable. This is done by the PLOTS option in the GAMPL procedure which can be used to fit generalized linear models with splines. They are called component plots. For this survival model, you can do the same by saving the spline values using the XBETA= option in the OUTPUT statement and then plotting them against the original variable.

 

proc phreg data=Myeloma;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   hazardratio protein / at (protein=10) units=1 to 5;
   ods output hazardratios=hr;
   output out=out xbeta=splprot;
   run;
proc sort data=out; by protein; run;
proc sgplot data=out noautolegend;
pbspline y=splprot x=protein;
yaxis label="Spline(protein)";
run;

For a plot of the hazard ratios using the splined original variable, you can produce a smooth curve in a similar way, but again, the spline applies to the original variable as above and then this is just used as a predictor in the model.

data hr; 
do units=1 to 5; set hr; output; end;
run;
proc sgplot data=hr noautolegend;
   band upper=waldupper lower=waldlower x=units / transparency=.5;
   pbspline y=hazardratio x=units;
   xaxis label='Protein' grid; 
   yaxis label='Hazard Ratio' grid;
   refline 1 / axis=y;
   run;

 

View solution in original post

10 REPLIES 10
Ksharp
Super User

if you want to get HR between RV=20 and RV=80 ,you can use UNIT= option ,
which is HR(RV+60)/HR(RV) when RV=20:

hazardratio 'Effect of 60-unit change in RV across RV' RV / at(RV = (20)) units=60;
bhr-q
Pyrite | Level 9

Thank you for your answer. Just how to create the spline curve using this way? to calculate HR(x; xreference) using exp(xbeta − xbeta_reference) across the range?

TomHsiung
Lapis Lazuli | Level 10

He wants the spline, which usually is a non-linear graph with x being a continuous independent variable (e.g., weight, blood pressure, creatinine level, etc.) and the y being the hazard ratio and its 95%CI bands.

Here is from Gemini:

Restricted cubic splines (RCS) don't actually "split" the coefficient of a single variable into several sub-coefficients directly. Instead, they **transform a single continuous variable into a set of new, related variables (called basis functions), and it's these new variables that each get their own regression coefficient.** The original variable itself is no longer directly in the model in its raw form, or if it is, it's typically just the first basis function.

The math behind this lies in the construction of these basis functions. The idea is to allow for a flexible, nonlinear relationship with the outcome while ensuring the curve behaves nicely (is "restricted" to be linear) in the tails of the data.

Let's assume you have a continuous variable, $X$, and you want to model its effect nonlinearly using RCS.

 

... (See the PDF file for completed answer from Gemini)

StatDave
SAS Super FREQ

It sounds like what you want is a plot of the spline fit with the original variable on the horizontal axis, the vertical axis is the hazard ratio, and the points on the plot are the hazard ratios for changing the original variable from a reference value (like 20 as you mention) various amounts. That can be done as has already been shown using the HAZARDRATIO statement with the UNITS= option. You can then save the resulting table of hazard ratios and then use PROC SGPLOT to plot it. For example, using the myeloma data in an example in the PHREG documentation:

proc phreg data=Myeloma;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   hazardratio protein / at (protein=10) units=1 to 5;
   ods output hazardratios=hr;
   run;
proc sgplot noautolegend;
   highlow high=waldupper low=waldlower x=description / lowcap=serif highcap=serif;
   scatter y=hazardratio x=description;
   xaxis label='Protein' grid; 
   yaxis label='Hazard Ratio' grid;
   refline 1 / axis=y;
   run;
bhr-q
Pyrite | Level 9

Thank you for your answer. I did try this way. it seems a bar-style plot, not a smooth spline curve.

with your sample data "Myeloma" it shows 

 

bhrq_0-1753115185218.png

 

 I would appreciate if you have any thought to create a spline curve?

StatDave
SAS Super FREQ

The spline is applied to the original variable, so an illustration of the spline is a plot of the spline values against the original variable. This is done by the PLOTS option in the GAMPL procedure which can be used to fit generalized linear models with splines. They are called component plots. For this survival model, you can do the same by saving the spline values using the XBETA= option in the OUTPUT statement and then plotting them against the original variable.

 

proc phreg data=Myeloma;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   hazardratio protein / at (protein=10) units=1 to 5;
   ods output hazardratios=hr;
   output out=out xbeta=splprot;
   run;
proc sort data=out; by protein; run;
proc sgplot data=out noautolegend;
pbspline y=splprot x=protein;
yaxis label="Spline(protein)";
run;

For a plot of the hazard ratios using the splined original variable, you can produce a smooth curve in a similar way, but again, the spline applies to the original variable as above and then this is just used as a predictor in the model.

data hr; 
do units=1 to 5; set hr; output; end;
run;
proc sgplot data=hr noautolegend;
   band upper=waldupper lower=waldlower x=units / transparency=.5;
   pbspline y=hazardratio x=units;
   xaxis label='Protein' grid; 
   yaxis label='Hazard Ratio' grid;
   refline 1 / axis=y;
   run;

 

bhr-q
Pyrite | Level 9

Thank you @StatDave for your answer,  

 

I just noticed something wrong. When I take a proc print of hr, I noticed in the output table, the HR for protein Unit=10 At protein=10 is 0.246, but since 10 was set as the reference value in the hazard ratio statement (at(protein=10)), I expected the HR should be 1.

units Description HazardRatio
1 protein Unit=1 At protein=10 1.063
2 protein Unit=2 At protein=10 1.051
3 protein Unit=3 At protein=10 0.855
4 protein Unit=4 At protein=10 0.61
5 protein Unit=5 At protein=10 0.496
6 protein Unit=6 At protein=10 0.431
7 protein Unit=7 At protein=10 0.374
8 protein Unit=8 At protein=10 0.325
9 protein Unit=9 At protein=10 0.283
10 protein Unit=10 At protein=10 0.246
11 protein Unit=11 At protein=10 0.213
12 protein Unit=12 At protein=10 0.185
13 protein Unit=13 At protein=10 0.161
14 protein Unit=14 At protein=10 0.14

 

proc phreg data=Myeloma;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   hazardratio protein / at (protein=10) units=1 to 27;
   ods output hazardratios=hr;
   output out=out xbeta=splprot;
   run;
data hr; 
do units=1 to 27; set hr; output; end;
run;
proc print data=hr;
run;
proc sgplot data=hr noautolegend;
   band upper=waldupper lower=waldlower x=units / transparency=.5;
   pbspline y=hazardratio x=units;
   xaxis label='Protein' ; 
   yaxis label='Hazard Ratio' ;
   refline 1 / axis=y;
   run;

I was wondering if you could have any thoughts on this?

 Thank you,

 

StatDave
SAS Super FREQ

The hazard ratios computed in each case is the change in hazard at protein=10 compared to the hazard at various-sized increases above 10 which is indicated by the UNITS. So, the value labeled "protein Unit=10 At protein=10" is the change in hazards when increasing protein from 10 to 20.

bhr-q
Pyrite | Level 9

Thank you for your answer. However, I wanted to clarify my goal in my initial comment. It was to plot the HR at each value of the continuous variable (protein) relative to a fixed reference value; for example, HR(x; x₀) = exp[βf(x) − βf(x₀)], where f() is the spline transformation of the variable.

 I think this explanation refers to getting the HR at each observed value of the continuous variable compared to a fixed reference (like protein = 10), rather than computing the change in hazard for specific unit increases starting from the reference value, as described in your explanation.

 

StatDave
SAS Super FREQ

If you want the hazard ratios comparing the predicted hazard from each observation with the predicted hazard when each observation is at a fixed reference value, then you can do that by saving the predicted log hazards using the XBETA= option in the OUTPUT statement for the original data and for data with all values set to the reference. Then compute the hazard ratios from the difference and plot them. For example

data ref; set Myeloma;
  protein=10; time=.;
  run;
data both; set Myeloma ref(in=inref); 
  ref=inref; 
  run;
proc phreg data=both;
   effect spl=spline(protein/naturalcubic);
   model Time*VStatus(0)=spl ;
   output out=out xbeta=xb;
   run;
data xb; set out; where ref=0; lhz=xb; keep lhz protein; run;
data xbref; set out; where ref=1; lhzref=xb; keep lhzref; run;
data hr; merge xb xbref; hr=exp(lhz-lhzref); run;
proc sort data=hr; by protein; run;
proc sgplot data=hr noautolegend;
  pbspline y=hr x=protein;
  refline 1 / axis=y;
  run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1201 views
  • 4 likes
  • 4 in conversation