BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi Folks:

 

I'm trying Cox PHM in an infectious disease sample (N=2,765) where number of events (death) is only N=48. My covariates include age groups by 10 years, gender and comorbidity (yes,no) with missing. 

My questions are:

1. Is Cox PHM is the good choice here? I tried log-rank tests but I wanted to have hazard estimates for all covariates included and accounted in the model at the same time. 

2. Shall I delete the observations for age groups with no events (death). The model output suggests that the observations with missing in age groups have not been used in the parameter estimation anyway?

 

missing in covariates.png

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

@Cruise  The categorization you came up with certainly seems workable, as you get results.  Are these results interpretable, i.e. do they make sense clinically/biologically? And you should probably make sure that the proportional hazards assumption is at least approximate.

 

SteveDenham

View solution in original post

5 REPLIES 5
cminard
Obsidian | Level 7

Time-to-event models are probably fine. I think that you have too many age group categories. In my opinion, you can use the results of this analysis to identify better age groupings. That is, which age groups can you combine together? You could also leave age as a continuous variable in the CPH regression model. I would NOT delete the observations for age groups with no events. I would reduce the number of age groups by combining similar groups.

 

Also, I personally would not define 10-year age groups. I would let the data analysis determine the categories. For example, identify the quartiles (25th, 50th, and 75th percentiles), then create 4 age groups based on those quartiles. Use KM analysis to see if the curves are different between the 4 age groups. Then combine similar age groups.

Cruise
Ammonite | Level 13

@cminard 

Thanks a lot Cminard. I really like the idea of grouping the ages based on their distributions. However, any idea how I could possibly do it given my age variables is categorical by 10 years to begin with. What I did below is that I grouped the 10-year age groups into 'agecat' based on the cross-tabulation of an original 10-year age group variable and death (1,0) status keeping in mind to try to have events allocated in the each group of a resulting 'agecat' variable. Any comments how I re-grouped the 10-year age group into 4 age groups in relation to what I see now in the model output? I don't understand why I get huge hazard estimated for the agecat=1 group. Thanks for brainstorming with me here. I really appreciate it.  

 

age and death grouping.png

 

if age1=. then agecat=99; 
if age1 in (0,1,2,3,4) then agecat=1; else
if age1 in (5,6) then agecat=2; else
if age1 in (7,8) then agecat=3; else
if age1 in (9,10) then agecat=4; 

 

cph.png

Cruise
Ammonite | Level 13

@SteveDenham interested and have time to look at my last follow-up question? I'm trying to re-group the categorical age variable by 10-year bin considering the distribution of the events in a new group variable (death) which I named 'agecat'. 

SteveDenham
Jade | Level 19

@Cruise  The categorization you came up with certainly seems workable, as you get results.  Are these results interpretable, i.e. do they make sense clinically/biologically? And you should probably make sure that the proportional hazards assumption is at least approximate.

 

SteveDenham

Cruise
Ammonite | Level 13

Thank you chiming in. I really appreciate it. I agree that the assumption needed to be assessed first. I found this SUGI paper to test the PH assumption and should follow its suggestions. https://www.pharmasug.org/proceedings/china2018/SP/Pharmasug-China-2018-SP75.pdf

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 777 views
  • 3 likes
  • 3 in conversation