BookmarkSubscribeRSS Feed

SAS macro to calculate the D-index for survival analyses

Started ‎07-01-2020 by
Modified ‎07-01-2020 by
Views 1,388

Submitted by S Shetterly, C Zeng, K Narwaney, C Clarke and S Xu

Institute for Health Research, Kaiser Permanente Colorado

 

This macro calculates the D-Index, a measure of discrimination between higher-and lower risk groups proposed by Royston and Sauerbrei1 for survival analysis models.

 

“D is an estimate of the log hazard ratio comparing two equal-sized prognostic groups. This is a natural measure of separation between two independent survival distributions under the proportional hazards assumption”.1

D measures prognostic separation of survival curves,and is closely related to the standard deviation of the prognostic index. It is computed by ordering the PI across patients, calculating the rankits (expected standard normal order statistics) corresponding to these values, dividing the latter by a factor κ =√8~1.596 and performing Cox regression on the scaled rankits. The resulting regression coefficient is D.”2

The macro allows for different inputs for k (named InputK in macro call) when estimating the rankits (a.k.a. expected value of the normal order statistics). The initial Royston and Sauerbrei D-index article used Bloom's approximation of the rankit where k=3/8 (inputK=0.375). Other approximations are possible (e.g. Gilchrist's book Statitistical Modeling with Quantile Functions (2000) used k=0.5).

 

 

Dataset used for macro example run:

          Worcester Heart Attack Study (distributed with Hosmer & Lemeshow (2008)) and available at UCLA’s Institute for Digital Research and Education (IDRE) site

https://stats.idre.ucla.edu/sas/seminars/sas-survival/

This data set has 500 subjects and examined an outcome of survival time after a heart attack with follow-up time beginning at the time of hospital admission (Survival time variable: LENFOL, censoring variable FSTAT (1=death, 0=lost to followup).  Test models here used the same independent variables examined in IDRE’s Introduction to Survival analysis:  Age, Gender (0=male, 1=female), BMI (body mass index) and HR (initial heart rate).

 

Macro inputs:

          Datain : input dataset

          Timevar: time or event or censor

          Censorvar: censoring vs outcome indicator

          ClassVar: list of class variables (if exist)

          IndepVarlist: list of all explanatory variables for selected model

           inputK: k for rankit estimation (can use 0.375 as default)

         

**Set libname for location of input data;
** example using Worchester heart attack data;
libname exampdat '\\KPCO_XXXX\D_index_macro';

/***D-INDEX CALCULATION:
          based on article: Royston P and Saierbrei W. A new measure of prognostic separation in survival data, Statist Med 2004 23:723-748.
*** The macro allows for different inputs for k (named InputK in macro call) when estimating the rankits as noted above. Royston paper used k=3/8 (inputk=0.375);
****/

%macro dindx(datain,timevar,censorvar,classvar,indepVarlist,inputK);
*run Cox model;
proc phreg data=&datain outest=betas(drop=_ties_ _type_ _name_);
      class &classvar;
       model &timevar*&censorvar(0)= &indepvarlist / rl;
     output out=outsave survival=survest xbeta=bx;
   run;  

*Rank estimated linear predictor;
proc rank data=outsave out=bx_rank;
var bx;
ranks bx_rank;
run;

***rank will create averages for ties (R code rank function default does the same);

proc means data=bx_rank;
var bx_rank;
run;

proc sql noprint;
  select max(bx_rank)
  into: rankN
  from bx_rank;
quit;

**Get estimated rankit;
data Bx_nrmINV (keep= normSinv_rnks Z &timevar &censorvar);
set bx_rank;
    calc=(bx_rank-&inputK)/(&rankN+1-2*&inputK);
    normSinv_rnks= probit(calc);
    PI_constant=constant("pi");
     kappa=sqrt(8/PI_constant);
    z=NormSinv_rnks/kappa;
run;

*Use estimated rankit as independent variable to create D-index estimates;
proc phreg data=bx_nrmINv outest=Z_coef (keep=Z rename=(z=D_index));
       model &timevar*&censorvar(0)= Z /covb;
    ods output covB=cov_save (rename=(D_index=CovB_dindex));
run;

data cov_saveSE;
set cov_save;
  D_index_SE=sqrt(covb_dindex);
run;

proc sql;
create table D_index as
    select a.D_index,
           b.D_index_SE,
           D_index-(1.96*D_index_se) as D_index_LCL,
           D_index+(1.96*D_index_se) as D_index_UCL 
from z_coef A , Cov_saveSE B;

proc print data=d_index;
run;

%mend;

*Two example macro calls that follow models from IDRE runs;
(these examples have no class variables so blank in that section of macro call);
%dindx(exampdat.whas500,lenfol,fstat,,age gender,0.375);

 

          D-index estimates in output:

D_index

D_index_SE

D_index_LCL

D_index_UCL

1.33510

0.11219

1.11520

1.55499

 

 

%dindx(exampdat.whas500,lenfol,fstat,,age|gender BMI|BMI hr,0.375);

 

D-index estimates in output:

D_index

D_index_SE

D_index_LCL

D_index_UCL

1.49224

0.11367

1.26944

1.71504

 

 

 

References

 

  • Royston P and Sauerbrei W. A new measure of prognostic separation in survival data, Statist Med 2004 23:723-748.
  • Royston P and Altman DG. External validation of a Cox prognostic model: principles and methods, BMC Medical Research Methodology 2013 13:33.
Version history
Last update:
‎07-01-2020 10:52 AM
Updated by:

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags