SAS macro to calculate the D-index for survival analyses

Submitted by S Shetterly, C Zeng, K Narwaney, C Clarke and S Xu

Institute for Health Research, Kaiser Permanente Colorado

This macro calculates the D-Index, a measure of discrimination between higher-and lower risk groups proposed by Royston and Sauerbrei¹ for survival analysis models.

“D is an estimate of the log hazard ratio comparing two equal-sized prognostic groups. This is a natural measure of separation between two independent survival distributions under the proportional hazards assumption”.¹

“D measures prognostic separation of survival curves,and is closely related to the standard deviation of the prognostic index. It is computed by ordering the PI across patients, calculating the rankits (expected standard normal order statistics) corresponding to these values, dividing the latter by a factor κ =√8/π ~1.596 and performing Cox regression on the scaled rankits. The resulting regression coefficient is D.”²

The macro allows for different inputs for k (named InputK in macro call) when estimating the rankits (a.k.a. expected value of the normal order statistics). The initial Royston and Sauerbrei D-index article used Bloom's approximation of the rankit where k=3/8 (inputK=0.375). Other approximations are possible (e.g. Gilchrist's book Statitistical Modeling with Quantile Functions (2000) used k=0.5).

Dataset used for macro example run:

Worcester Heart Attack Study (distributed with Hosmer & Lemeshow (2008)) and available at UCLA’s Institute for Digital Research and Education (IDRE) site

https://stats.idre.ucla.edu/sas/seminars/sas-survival/

This data set has 500 subjects and examined an outcome of survival time after a heart attack with follow-up time beginning at the time of hospital admission (Survival time variable: LENFOL, censoring variable FSTAT (1=death, 0=lost to followup). Test models here used the same independent variables examined in IDRE’s Introduction to Survival analysis: Age, Gender (0=male, 1=female), BMI (body mass index) and HR (initial heart rate).

Macro inputs:

Datain : input dataset

Timevar: time or event or censor

Censorvar: censoring vs outcome indicator

ClassVar: list of class variables (if exist)

IndepVarlist: list of all explanatory variables for selected model

inputK: k for rankit estimation (can use 0.375 as default)

**Set libname for location of input data;
** example using Worchester heart attack data;
libname exampdat '\\KPCO_XXXX\D_index_macro';

/***D-INDEX CALCULATION:
          based on article: Royston P and Saierbrei W. A new measure of prognostic separation in survival data, Statist Med 2004 23:723-748.
*** The macro allows for different inputs for k (named InputK in macro call) when estimating the rankits as noted above. Royston paper used k=3/8 (inputk=0.375);
****/

%macro dindx(datain,timevar,censorvar,classvar,indepVarlist,inputK);
*run Cox model;
proc phreg data=&datain outest=betas(drop=_ties_ _type_ _name_);
      class &classvar;
       model &timevar*&censorvar(0)= &indepvarlist / rl;
     output out=outsave survival=survest xbeta=bx;
   run;  

*Rank estimated linear predictor;
proc rank data=outsave out=bx_rank;
var bx;
ranks bx_rank;
run;

***rank will create averages for ties (R code rank function default does the same);

proc means data=bx_rank;
var bx_rank;
run;

proc sql noprint;
  select max(bx_rank)
  into: rankN
  from bx_rank;
quit;

**Get estimated rankit;
data Bx_nrmINV (keep= normSinv_rnks Z &timevar &censorvar);
set bx_rank;
    calc=(bx_rank-&inputK)/(&rankN+1-2*&inputK);
    normSinv_rnks= probit(calc);
    PI_constant=constant("pi");
     kappa=sqrt(8/PI_constant);
    z=NormSinv_rnks/kappa;
run;

*Use estimated rankit as independent variable to create D-index estimates;
proc phreg data=bx_nrmINv outest=Z_coef (keep=Z rename=(z=D_index));
       model &timevar*&censorvar(0)= Z /covb;
    ods output covB=cov_save (rename=(D_index=CovB_dindex));
run;

data cov_saveSE;
set cov_save;
  D_index_SE=sqrt(covb_dindex);
run;

proc sql;
create table D_index as
    select a.D_index,
           b.D_index_SE,
           D_index-(1.96*D_index_se) as D_index_LCL,
           D_index+(1.96*D_index_se) as D_index_UCL 
from z_coef A , Cov_saveSE B;

proc print data=d_index;
run;

%mend;

*Two example macro calls that follow models from IDRE runs;
(these examples have no class variables so blank in that section of macro call);
%dindx(exampdat.whas500,lenfol,fstat,,age gender,0.375);

D-index estimates in output:

D_index	D_index_SE	D_index_LCL	D_index_UCL
1.33510	0.11219	1.11520	1.55499

%dindx(exampdat.whas500,lenfol,fstat,,age|gender BMI|BMI hr,0.375);

D-index estimates in output:

D_index	D_index_SE	D_index_LCL	D_index_UCL
1.49224	0.11367	1.26944	1.71504

References

Royston P and Sauerbrei W. A new measure of prognostic separation in survival data, Statist Med 2004 23:723-748.
Royston P and Altman DG. External validation of a Cox prognostic model: principles and methods, BMC Medical Research Methodology 2013 13:33.

SAS macro to calculate the D-index for survival analyses

Ready to join fellow brilliant minds for the SAS Hackathon?

Free course: Data Literacy Essentials

Get Started