BookmarkSubscribeRSS Feed
Thomas45
Fluorite | Level 6

Hi,

When performing the same cox-regression in SAS (SAS-STAT 15.1) and i R (including and interaction with a 3 knot restricted cubic spline), I receive different parameter estimates (i.e on age_spline 2). In my example below I have used the dataset=WHAS500 found on this website (https://stats.idre.ucla.edu/sas/seminars/sas-survival/)

To harmonize the models (and the splines) between SAS and R I have used knots according to Harrell's scheme (https://blogs.sas.com/content/iml/2019/02/18/regression-restricted-cubic-splines-sas.html) and set both TIES to EFRON

 

SAS-code

proc phreg data = whas500;
effect age_spline = spline(age/ details naturalcubic basis=tpf(noint) 
knotmethod=percentilelist(10 50 90) );
model lenfol*fstat(0) = gender|age_spline /rl ties=EFRON; 
run;

R-code

fit <- coxph(Surv(LENFOL, FSTAT) ~ GENDER*rcs(AGE,3), data = whas500)

Output:

parameter_comparison.png

 

I would be very thankful if someone can help me explain the observed differences.

 

Thomas

8 REPLIES 8
PaigeMiller
Diamond | Level 26

 Most likely, the algorithms used are not the same. Or the options used are not the same.

--
Paige Miller
SteveDenham
Jade | Level 19

All of the values I see except the age_spline 2 look the same within some sort of rounding difference.  The interesting thing about age_spline 2 is that the chi square is a match (again within rounding error) to the Z value (in R) squared, as they are for the other terms.  This does make me think that @PaigeMiller  has identified the issue that the algorithm for constructing the splines may be different.  However, the interaction with gender terms do match pretty well for age_spline 1 and show the same issue for age_spline 2.  I think I have just turned your short question into a long one, without providing much insight. 

 

This looks like a job for the Cross Validated site at this point, unless one of the Super Freq's who knows the algorithms behind both can jump in to help.

 

SteveDenham

Thomas45
Fluorite | Level 6

Thanks for you suggestions!

I have now also posted this on Cross validated. 

Thomas45
Fluorite | Level 6

Any Super Freq's who knows the algorithms behind restricted cubic splines in SAS (knotmethod=percentlielist)?

/T

Rick_SAS
SAS Super FREQ

As Paige says, it is often very difficult to compare across software implementations because of difference in defaults and algorithms. You basically have to be intimately familiar with the implementation in both software products.

 

For example, as I mention in one of my blog posts, the basis functions that are generated by the EFFECT statement are not equal to the basis functions created by Harrel's %RCSPLINE macro, but they are equivalent. The EFFECT statement uses the definition from The Elements of Statistical Learning (Hastie, Tibshirani, and Friedman, 2009, 2nd Ed, pp. 144-146). The %RCSPLINE macro implements scaled versions of the basis function. Thus parameter estimates will be different but the predicted values will be the same.

 

And you not only need to know the differences in the algorithm that you are investigating, but every algorithm that is USED by that algorithm! Since you are looking at placing the knots at percentiles, and since you explicitly mentioned handling TIES in the data, I will point out that SAS and R use different default definitions of percentiles. For details, see my article on the 9 standard definitions of percentiles.

 

I don't have time to research this issue, but I wish you luck. I suggest you to investigate whether the PREDICTED value are approximately the same. If so, they represent the same model even if the parameterization is different. You might want to run the model for a simulated data set that has no duplicates, to eliminate the tied values/percentile issue.

Thomas45
Fluorite | Level 6

Thank you all for your comments and suggestions!

I have analysed the predicted estimates and there are approx the same between R and SAS

 

Thanks again

/Thomas

Rick_SAS
SAS Super FREQ

Great news. You can indicate that the problem is resolved by marking one of the responses as the solution. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1901 views
  • 7 likes
  • 5 in conversation