12-22-2020
sshetter
Fluorite | Level 6
Member since
07-24-2017
- 5 Posts
- 0 Likes Given
- 1 Solutions
- 5 Likes Received
-
Latest posts by sshetter
Subject Views Posted 1257 07-13-2020 10:59 AM 1590 07-01-2020 09:59 AM 3696 06-08-2020 08:36 PM 2451 06-08-2020 08:16 PM 30912 06-08-2020 07:54 PM -
Activity Feed for sshetter
- Got a Like for Re: Plotting question on SAS PROC TRAJ. 06-26-2023 10:59 AM
- Got a Like for Re: How should I organize my raw data to run a Survival analysis on it. 07-14-2020 11:13 AM
- Got a Like for Re: How should I organize my raw data to run a Survival analysis on it. 07-13-2020 02:04 PM
- Posted Re: How should I organize my raw data to run a Survival analysis on it on Statistical Procedures. 07-13-2020 10:59 AM
- Posted SAS macro to calculate the D-index for survival analyses on SAS Communities Library. 07-01-2020 09:59 AM
- Got a Like for Re: 3 reasons why you should write an article for the SAS Communities Library. 06-10-2020 12:33 PM
- Got a Like for Re: Plotting question on SAS PROC TRAJ. 06-09-2020 08:23 AM
- Posted Re: Plotting question on SAS PROC TRAJ on Statistical Procedures. 06-08-2020 08:36 PM
- Posted Re: CALCULATION OF ROYSTON'S D INDEX FOR COX MODEL on Statistical Procedures. 06-08-2020 08:16 PM
- Posted Re: 3 reasons why you should write an article for the SAS Communities Library on Community Memo. 06-08-2020 07:54 PM
-
My Liked Posts
Subject Likes Posted 2 07-13-2020 10:59 AM 1 06-08-2020 07:54 PM 2 06-08-2020 08:36 PM -
My Library Contributions
Subject Likes Author Latest Post 0
07-13-2020
10:59 AM
2 Likes
I would recommend that you look into setting up your data as a 'counting process' so that your time varying variables can be readily incorporated. It will have you set up multiple records per person. The brief notes I copied below are from the SurveyPHREG notes (which I haven't even used since I tend to use PHREG) but it is a good, brief description to start you off on this road. It is more 'data work' up front but I much prefer it to the internal programming that can also be done for time-varying covariates in PHREG since I can do more data checks and make sure the set up is appropriate. **COPIED TEXT Counting Process Style of Input In the counting process formulation, data for each subject are identified by a triple of counting, at-risk, and covariate processes. indicates the sum of weights for all events that the subject experiences over the time interval , indicates whether the subject is at risk at time t (1 if at risk and 0 otherwise), and is a vector of explanatory variables for the subject at time t. The sample path of N is a step function with jumps at the event times, and . Unless changes continuously with time, the data for each subject can be represented by multiple observations, each of which identifies by a semiclosed time interval , the values of the explanatory variables over that interval, and the event status at . The subject remains at risk during the interval , and an event might occur at . Values of the explanatory variables for the subject remain unchanged in the interval. This style of data input was originated by Therneau (1994). For example, suppose a patient (ID=1) with an analysis weight of 10 has a tumor recurrence at weeks 3, 10, and 15 and is followed up until week 23. Consider three fixed explanatory variables Trt (treatment), Number (initial tumor number), and Size (initial tumor size), one weight variable Weight (analysis weight), one patient identification variable ID, and one time-dependent covariate Z that represents a hormone level. The value of Z might change during the follow-up period. The data for this patient are represented by the following four observations:
Here (T1,T2] contains the at-risk intervals. The variable Status indicates whether a recurrence has occurred at T2: a value of 1 indicates a tumor recurrence, and a value of 0 indicates non-recurrence. Assume the patients are selected independently. Because there are multiple observation rows for every patient, you should use the CLUSTER statement to identify each individual patient. The CLUSTER statement computes the variability between the patients. The following statements fit a multiplicative hazards model with baseline covariates Trt, Number, and Size, and a time-varying covariate Z. For more information, see the section The Multiplicative Hazards Model. proc surveyphreg; weight Weight; cluster ID; model (T1,T2) * Status(0) = Trt Number Size Z; run;
... View more
07-01-2020
09:59 AM
Submitted by S Shetterly, C Zeng, K Narwaney, C Clarke and S Xu
Institute for Health Research, Kaiser Permanente Colorado
This macro calculates the D-Index, a measure of discrimination between higher-and lower risk groups proposed by Royston and Sauerbrei 1 for survival analysis models.
“D is an estimate of the log hazard ratio comparing two equal-sized prognostic groups. This is a natural measure of separation between two independent survival distributions under the proportional hazards assumption”. 1
“D measures prognostic separation of survival curves,and is closely related to the standard deviation of the prognostic index. It is computed by ordering the PI across patients, calculating the rankits (expected standard normal order statistics) corresponding to these values, dividing the latter by a factor κ =√8/π ~1.596 and performing Cox regression on the scaled rankits. The resulting regression coefficient is D.” 2
The macro allows for different inputs for k (named InputK in macro call) when estimating the rankits (a.k.a. expected value of the normal order statistics). The initial Royston and Sauerbrei D-index article used Bloom's approximation of the rankit where k=3/8 (inputK=0.375). Other approximations are possible (e.g. Gilchrist's book Statitistical Modeling with Quantile Functions (2000) used k=0.5).
Dataset used for macro example run:
Worcester Heart Attack Study (distributed with Hosmer & Lemeshow (2008)) and available at UCLA’s Institute for Digital Research and Education (IDRE) site
https://stats.idre.ucla.edu/sas/seminars/sas-survival/
This data set has 500 subjects and examined an outcome of survival time after a heart attack with follow-up time beginning at the time of hospital admission (Survival time variable: LENFOL, censoring variable FSTAT (1=death, 0=lost to followup). Test models here used the same independent variables examined in IDRE’s Introduction to Survival analysis: Age, Gender (0=male, 1=female), BMI (body mass index) and HR (initial heart rate).
Macro inputs:
Datain : input dataset
Timevar: time or event or censor
Censorvar: censoring vs outcome indicator
ClassVar: list of class variables (if exist)
IndepVarlist: list of all explanatory variables for selected model
inputK: k for rankit estimation (can use 0.375 as default)
**Set libname for location of input data;
** example using Worchester heart attack data;
libname exampdat '\\KPCO_XXXX\D_index_macro';
/***D-INDEX CALCULATION:
based on article: Royston P and Saierbrei W. A new measure of prognostic separation in survival data, Statist Med 2004 23:723-748.
*** The macro allows for different inputs for k (named InputK in macro call) when estimating the rankits as noted above. Royston paper used k=3/8 (inputk=0.375);
****/
%macro dindx(datain,timevar,censorvar,classvar,indepVarlist,inputK);
*run Cox model;
proc phreg data=&datain outest=betas(drop=_ties_ _type_ _name_);
class &classvar;
model &timevar*&censorvar(0)= &indepvarlist / rl;
output out=outsave survival=survest xbeta=bx;
run;
*Rank estimated linear predictor;
proc rank data=outsave out=bx_rank;
var bx;
ranks bx_rank;
run;
***rank will create averages for ties (R code rank function default does the same);
proc means data=bx_rank;
var bx_rank;
run;
proc sql noprint;
select max(bx_rank)
into: rankN
from bx_rank;
quit;
**Get estimated rankit;
data Bx_nrmINV (keep= normSinv_rnks Z &timevar &censorvar);
set bx_rank;
calc=(bx_rank-&inputK)/(&rankN+1-2*&inputK);
normSinv_rnks= probit(calc);
PI_constant=constant("pi");
kappa=sqrt(8/PI_constant);
z=NormSinv_rnks/kappa;
run;
*Use estimated rankit as independent variable to create D-index estimates;
proc phreg data=bx_nrmINv outest=Z_coef (keep=Z rename=(z=D_index));
model &timevar*&censorvar(0)= Z /covb;
ods output covB=cov_save (rename=(D_index=CovB_dindex));
run;
data cov_saveSE;
set cov_save;
D_index_SE=sqrt(covb_dindex);
run;
proc sql;
create table D_index as
select a.D_index,
b.D_index_SE,
D_index-(1.96*D_index_se) as D_index_LCL,
D_index+(1.96*D_index_se) as D_index_UCL
from z_coef A , Cov_saveSE B;
proc print data=d_index;
run;
%mend;
*Two example macro calls that follow models from IDRE runs;
(these examples have no class variables so blank in that section of macro call);
%dindx(exampdat.whas500,lenfol,fstat,,age gender,0.375);
D-index estimates in output:
D_index
D_index_SE
D_index_LCL
D_index_UCL
1.33510
0.11219
1.11520
1.55499
%dindx(exampdat.whas500,lenfol,fstat,,age|gender BMI|BMI hr,0.375);
D-index estimates in output:
D_index
D_index_SE
D_index_LCL
D_index_UCL
1.49224
0.11367
1.26944
1.71504
References
Royston P and Sauerbrei W. A new measure of prognostic separation in survival data, Statist Med 2004 23:723-748.
Royston P and Altman DG. External validation of a Cox prognostic model: principles and methods, BMC Medical Research Methodology 2013 13:33.
... View more
Labels:
06-08-2020
08:36 PM
2 Likes
The plots in %trajplot are helpful, quick displays but when I need to customize graphs, I use the output in the saved plot datasets and then use proc sgplot to customize what is needed. For example, if you only want the 'predicted' lines, the example below shows you set up each group as a series. You can similarly include the observed averages on the same graph and change the line style.... proc sgplot data=plot_MME; title 'Predicted MME'; series x=month Y=predmme1; series x=month Y=predmme2; series x=month Y=predmme3; series x=month Y=predmme4; * Yaxis values=(0 to 375) label='Predicted MME'; label predmme1='Group 1 Pred' predmme2='Group 2 Pred' predmme3='Group 3 Pred' predmme4='Group 4 Pred' run;
... View more
06-08-2020
08:16 PM
I know I am 'late' to this conversation, but I was similarly looking to calculate the D-Index and had the advantage of R code created by another statistician to use as I was working through steps. Royston and Sauerbrei's original article has a bit more detail than the description in Austin et. al. and it makes a difference in the final calculations. I'll post the macro I created as soon as I am on the active enough on the community to do so.... (Or if there is too much of a delay, I'll have a colleague do it for me 🙂
... View more
06-08-2020
07:54 PM
1 Like
I looked previously for a D-index macro / code and saw some code was shared but wasn't quite correct. I have code to share and details I can share. I will look to do so... (or to avoid delay, I may just have one of my more prolific colleagues share it). First, I'll go find that first post and let folks know that it is a bit more complicated...
... View more