BookmarkSubscribeRSS Feed
rahulkunte
Calcite | Level 5

Hello!
I am trying to perform survival analysis on a sample with 100,000 observations. 
The sample is 90% censored, so there are around 10,000 events.

 

(The reason for choosing such a large sample was to ensure adequate number of events with a 90% censoring rate in the population)

 

The survival time is in days from date of birth, the event is death.

Left truncation is accounted for by including the "entry =" option in the model statement. 

 

I have used proc phreg for a semi-parametric cox proportional hazards model and followed this procedure:

 

1. Model with two variables: education level (4 levels),  sex(2 levels). 

 

 

proc phreg data=model_data;
	class sex(ref='M') edu_n(ref='3');
	model surv_t_dob*event(0) =  edu_n sex /entry=surv_t_till_s; 
        output ressch = _all_;
run;

2. The PH assumption is violated for both the variables, verified by inspecting log cumulative hazard plots, schoenfeld residuals, and time-dependent interaction significance.

 

3. To remedy the PH violation, I stratified on sex and included 3 time interactions (education level i * survival time) with one interaction for each level of education except the reference level. 

proc phreg data=model_data;
	class sex(ref='M') edu_n(ref='3');
	strata sex;
	model surv_t_dob*event(0) = 
		edu_n edu_nt1 edu_nt2 edu_nt4 / entry=surv_t_till_s; 
	edu_nt1 = (edu_n=1)*surv_t_dob;
	edu_nt2 = (edu_n=2)*surv_t_dob;
	edu_nt4 = (edu_n=4)*surv_t_dob;
output ressch =_all_; run;

Now, I want to know if I can correctly interpret the hazard ratios of this extended cox model. My initial guess was to look at the model fit statistics and also the Schoenfeld residuals. 

The fit statistics tell me that the model performs better than a null model: 

Model Fit Statistics

 
CriterionWithoutWith
 CovariatesCovariates
-2 LOG L188952.79188658.3
AIC188952.8188670.3
SBC188952.8188714.6

 


But, the problem is that PROC PHREG does not create an output dataset when time dependent covariates are included using programming statements.

I tried to include the time dependent variables separately in a data step but according to this discussion: Link  the method is incorrect. 

Another option was to use counting style process of input, but according to this note: Link , the survival estimates are wrong when a counting style process input with time dependent covariates is used and there is no circumvention. Hence, I assume that the residuals will also be incorrect. 

 
My questions are:

Q1: How do I validate this extended Cox model in SAS? When can I appropriately interpret the Hazard Ratios? 

Q2: Is it possible to look at the residuals of such a model? Why does SAS not create an output data set when time dependent covariates are included?
 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 0 replies
  • 450 views
  • 0 likes
  • 1 in conversation