BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Season
Barite | Level 11

I am building a cause-specific hazard regression model. What catches my attention is that different results are attained with different programming syntax that essentially build the same model.

I used the bone marrow transplantation dataset of Example 87.15 of SAS Help for analysis, which is provided here:

 

data Bmt;
   input Disease T Status @@;
   label T='Disease-Free Survival in Days';
   format Disease DiseaseGroup.;
   datalines;
1   2081   0   1   1602   0   1   1496   0   1   1462   0   1   1433   0
1   1377   0   1   1330   0   1    996   0   1    226   0   1   1199   0
1   1111   0   1    530   0   1   1182   0   1   1167   0   1    418   2
1    383   1   1    276   2   1    104   1   1    609   1   1    172   2
1    487   2   1    662   1   1    194   2   1    230   1   1    526   2
1    122   2   1    129   1   1     74   1   1    122   1   1     86   2
1    466   2   1    192   1   1    109   1   1     55   1   1      1   2
1    107   2   1    110   1   1    332   2   2   2569   0   2   2506   0
2   2409   0   2   2218   0   2   1857   0   2   1829   0   2   1562   0
2   1470   0   2   1363   0   2   1030   0   2    860   0   2   1258   0
2   2246   0   2   1870   0   2   1799   0   2   1709   0   2   1674   0
2   1568   0   2   1527   0   2   1324   0   2    957   0   2    932   0
2    847   0   2    848   0   2   1850   0   2   1843   0   2   1535   0
2   1447   0   2   1384   0   2    414   2   2   2204   2   2   1063   2
2    481   2   2    105   2   2    641   2   2    390   2   2    288   2
2    421   1   2     79   2   2    748   1   2    486   1   2     48   2
2    272   1   2   1074   2   2    381   1   2     10   2   2     53   2
2     80   2   2     35   2   2    248   1   2    704   2   2    211   1
2    219   1   2    606   1   3   2640   0   3   2430   0   3   2252   0
3   2140   0   3   2133   0   3   1238   0   3   1631   0   3   2024   0
3   1345   0   3   1136   0   3    845   0   3    422   1   3    162   2
3     84   1   3    100   1   3      2   2   3     47   1   3    242   1
3    456   1   3    268   1   3    318   2   3     32   1   3    467   1
3     47   1   3    390   1   3    183   2   3    105   2   3    115   1
3    164   2   3     93   1   3    120   1   3     80   2   3    677   2
3     64   1   3    168   2   3     74   2   3     16   2   3    157   1
3    625   1   3     48   1   3    273   1   3     63   2   3     76   1
3    113   1   3    363   2
;
run;

I first used the method introduced in Survival Analysis Using SAS: A Practical Guide, Second Edition: 9781599946405: Medicine & Health Sci.... Now that I wish to focus on the relapse-specific hazard (status=1), I treated both censor (status=0) and death before replase (status=2) as competing events. This helps me formulate the following code:

proc phreg data=bmt;
class disease/param=ref ref=first;
model t*status(0,2)=disease;
run;

This generates the following result:

Season_0-1734017532440.png

On the other hand, SAS Help provides other syntax:

proc phreg data=Bmt;
   class Disease (ref=first);
   model T*Status(0)=Disease / eventcode(cox)=1;
run;

That generates another set of estimates for event=1:

Season_2-1734017694088.png

Comparing these results with the ones above, it can be seen that the regression coefficients are close, but not identical. The difference in the results pertaining to disease=3 causes a difference of 0.0001 between the two Pr>Chisq's.

The questions I wish to raise are: why are the results different? Is the underlying factor that causes this difference consequential with respect to the accuracy of results had we conducted analyses in other scenarios (e.g., more predictors, more observations, more competing events)?

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Season,

 

It appears to me that the discrepancies vanish (as they should) if you tighten the convergence criteria, e.g., add the option GCONV=1E-10 to the MODEL statements (default is 1E-8). So it's just the iterative algorithms used by PROC PHREG approaching the same limit in different ways and thus reaching convergence at slightly different values of the parameter estimates.

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @Season,

 

It appears to me that the discrepancies vanish (as they should) if you tighten the convergence criteria, e.g., add the option GCONV=1E-10 to the MODEL statements (default is 1E-8). So it's just the iterative algorithms used by PROC PHREG approaching the same limit in different ways and thus reaching convergence at slightly different values of the parameter estimates.

Season
Barite | Level 11

Thank you for your reply!


@FreelanceReinh wrote:

Hello @Season,

 

It appears to me that the discrepancies vanish (as they should) if you tighten the convergence criteria, e.g., add the option GCONV=1E-10 to the MODEL statements (default is 1E-8). So it's just the iterative algorithms used by PROC PHREG approaching the same limit in different ways and thus reaching convergence at slightly different values of the parameter estimates.


Yes, you are right. Imposing a more stringent restriction on model convergence criteria like the one you mentioned yields concordant results.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 935 views
  • 2 likes
  • 2 in conversation