BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Geoghegan
Obsidian | Level 7

Hello,

I'm trying to check the proportional hazards for my final cox proportional hazards model for two models. For the first, I tried using this code:

data stats.adjusted6;
set stats.adjusted3;
il6time=log_il6*time;
agetime=age_enrollment*time;
bptime=sysbp*time;
bmitime=bmi*time;
sextime=sex*time;
run;
proc phreg data=stats.adjusted3;
class sex sextime;
model time*CVD(0)= log_il6 age_enrollment sysbp bmi sex il6time agetime bptime bmitime sextime/ rl ties=exact;
run;

 

At first I had tried to make the time variables in the procedure step and it gave me an error that the sextime variable did not exist so I tried making them in a previous data step but got errors that look like this (it repeated a lot of times):

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      389:9
NOTE: Invalid numeric data, sex='Female' , at line 389 column 9.
Age_enrollment=89 Age_last_contact=98 Alive=No sex=Female Education=3 mmse=28 DSST=18 bmi=23.7
sysbp=125 log_alb=1.335001067 grip_strength=16 log_il6=-0.37106429 gait_speed=0.538720539
log_new_hscrp=-0.139262198 fev1_7=1.64 CVD_inc=0 CVD_inc_age=. IL6=low CVD=0 time=9
il6time=-3.33957861 agetime=801 bptime=1125 bmitime=213.3 sextime=. _ERROR_=1 _N_=1

 

The repeating then ended with a warning:

WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed.
Age_enrollment=63 Age_last_contact=75 Alive=Yes sex=Female Education=16 mmse=30 DSST=42 bmi=18.9
sysbp=113 log_alb=1.410986974 grip_strength=27 log_il6=-1.108662855 gait_speed=1.32231405
log_new_hscrp=-1.347073686 fev1_7=2.21 CVD_inc=1 CVD_inc_age=73 IL6=low CVD=1 time=10
il6time=-11.08662855 agetime=630 bptime=1130 bmitime=189 sextime=. _ERROR_=1 _N_=20
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      105 at 385:16    105 at 386:23    105 at 387:13    105 at 388:12    3373 at 389:12

 

I'm assuming the problem is because my sex variable is in male/female format instead of numbers but I'm not certain. Should I somehow convert it to be numbers? If so, I'm not sure how to do that since my whole dataset has it as a character variable. 

Also, I have another model that includes an interaction term, which I assume could also present problems. Would I just create a time variable the same way for that or is there a different way to check the proportional hazards assumption when there is an interaction term?

 

Thank you very much!

1 ACCEPTED SOLUTION

Accepted Solutions
JacobSimonsen
Barite | Level 11

Hello,

you are doing some mistakes.

1) it doesnt give meaing to multiply the character variable sex with time. That should be replaced with something that can intepretate and translate characters into numeric. Try this 
  timeXsex=time*(sex='Male');

 

2) This is the most important! It doesnt give meaing to make the interaction terms between covariates and the time variable in the dataset. These interaction terms should be made inside phreg. The reason for this is that interaction should based on the time at each riskset, and not the time for event at each observation. it will be something like this (simplified) 


proc phreg data=simulation;
  class sex;
  timeXsex=time*(sex='Male');
  model time=exposure sex timeXsex;
run;

 

3) Unless you have very low statistical power, I dont see any reason for why you have all covariates in the model line, and none in the strata statement. Have the covariates in the strata statement gives a more robust model, and you will then then only need to check the proportional hazard assumption on the covariate of interest which should be in the model line. 

View solution in original post

1 REPLY 1
JacobSimonsen
Barite | Level 11

Hello,

you are doing some mistakes.

1) it doesnt give meaing to multiply the character variable sex with time. That should be replaced with something that can intepretate and translate characters into numeric. Try this 
  timeXsex=time*(sex='Male');

 

2) This is the most important! It doesnt give meaing to make the interaction terms between covariates and the time variable in the dataset. These interaction terms should be made inside phreg. The reason for this is that interaction should based on the time at each riskset, and not the time for event at each observation. it will be something like this (simplified) 


proc phreg data=simulation;
  class sex;
  timeXsex=time*(sex='Male');
  model time=exposure sex timeXsex;
run;

 

3) Unless you have very low statistical power, I dont see any reason for why you have all covariates in the model line, and none in the strata statement. Have the covariates in the strata statement gives a more robust model, and you will then then only need to check the proportional hazard assumption on the covariate of interest which should be in the model line. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 985 views
  • 0 likes
  • 2 in conversation