Hi Folks:
I'm trying Cox PHM in an infectious disease sample (N=2,765) where number of events (death) is only N=48. My covariates include age groups by 10 years, gender and comorbidity (yes,no) with missing.
My questions are:
1. Is Cox PHM is the good choice here? I tried log-rank tests but I wanted to have hazard estimates for all covariates included and accounted in the model at the same time.
2. Shall I delete the observations for age groups with no events (death). The model output suggests that the observations with missing in age groups have not been used in the parameter estimation anyway?
@Cruise The categorization you came up with certainly seems workable, as you get results. Are these results interpretable, i.e. do they make sense clinically/biologically? And you should probably make sure that the proportional hazards assumption is at least approximate.
SteveDenham
Time-to-event models are probably fine. I think that you have too many age group categories. In my opinion, you can use the results of this analysis to identify better age groupings. That is, which age groups can you combine together? You could also leave age as a continuous variable in the CPH regression model. I would NOT delete the observations for age groups with no events. I would reduce the number of age groups by combining similar groups.
Also, I personally would not define 10-year age groups. I would let the data analysis determine the categories. For example, identify the quartiles (25th, 50th, and 75th percentiles), then create 4 age groups based on those quartiles. Use KM analysis to see if the curves are different between the 4 age groups. Then combine similar age groups.
Thanks a lot Cminard. I really like the idea of grouping the ages based on their distributions. However, any idea how I could possibly do it given my age variables is categorical by 10 years to begin with. What I did below is that I grouped the 10-year age groups into 'agecat' based on the cross-tabulation of an original 10-year age group variable and death (1,0) status keeping in mind to try to have events allocated in the each group of a resulting 'agecat' variable. Any comments how I re-grouped the 10-year age group into 4 age groups in relation to what I see now in the model output? I don't understand why I get huge hazard estimated for the agecat=1 group. Thanks for brainstorming with me here. I really appreciate it.
if age1=. then agecat=99;
if age1 in (0,1,2,3,4) then agecat=1; else
if age1 in (5,6) then agecat=2; else
if age1 in (7,8) then agecat=3; else
if age1 in (9,10) then agecat=4;
@SteveDenham interested and have time to look at my last follow-up question? I'm trying to re-group the categorical age variable by 10-year bin considering the distribution of the events in a new group variable (death) which I named 'agecat'.
@Cruise The categorization you came up with certainly seems workable, as you get results. Are these results interpretable, i.e. do they make sense clinically/biologically? And you should probably make sure that the proportional hazards assumption is at least approximate.
SteveDenham
Thank you chiming in. I really appreciate it. I agree that the assumption needed to be assessed first. I found this SUGI paper to test the PH assumption and should follow its suggestions. https://www.pharmasug.org/proceedings/china2018/SP/Pharmasug-China-2018-SP75.pdf
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.