Solved: Re: Cox proportional HM - missing data

Cruise · Posted 04-06-2020 09:25 AM

Hi Folks:

I'm trying Cox PHM in an infectious disease sample (N=2,765) where number of events (death) is only N=48. My covariates include age groups by 10 years, gender and comorbidity (yes,no) with missing.

My questions are:

1. Is Cox PHM is the good choice here? I tried log-rank tests but I wanted to have hazard estimates for all covariates included and accounted in the model at the same time.

2. Shall I delete the observations for age groups with no events (death). The model output suggests that the observations with missing in age groups have not been used in the parameter estimation anyway?

SteveDenham · Posted 04-07-2020 07:51 AM

@Cruise The categorization you came up with certainly seems workable, as you get results. Are these results interpretable, i.e. do they make sense clinically/biologically? And you should probably make sure that the proportional hazards assumption is at least approximate.

SteveDenham

View solution in original post

cminard · Posted 04-06-2020 11:51 AM

Time-to-event models are probably fine. I think that you have too many age group categories. In my opinion, you can use the results of this analysis to identify better age groupings. That is, which age groups can you combine together? You could also leave age as a continuous variable in the CPH regression model. I would NOT delete the observations for age groups with no events. I would reduce the number of age groups by combining similar groups.

Also, I personally would not define 10-year age groups. I would let the data analysis determine the categories. For example, identify the quartiles (25th, 50th, and 75th percentiles), then create 4 age groups based on those quartiles. Use KM analysis to see if the curves are different between the 4 age groups. Then combine similar age groups.

Cruise · Posted 04-06-2020 12:07 PM

@cminard

Thanks a lot Cminard. I really like the idea of grouping the ages based on their distributions. However, any idea how I could possibly do it given my age variables is categorical by 10 years to begin with. What I did below is that I grouped the 10-year age groups into 'agecat' based on the cross-tabulation of an original 10-year age group variable and death (1,0) status keeping in mind to try to have events allocated in the each group of a resulting 'agecat' variable. Any comments how I re-grouped the 10-year age group into 4 age groups in relation to what I see now in the model output? I don't understand why I get huge hazard estimated for the agecat=1 group. Thanks for brainstorming with me here. I really appreciate it.

if age1=. then agecat=99; 
if age1 in (0,1,2,3,4) then agecat=1; else
if age1 in (5,6) then agecat=2; else
if age1 in (7,8) then agecat=3; else
if age1 in (9,10) then agecat=4;

Cruise · Posted 04-06-2020 02:28 PM

@SteveDenham interested and have time to look at my last follow-up question? I'm trying to re-group the categorical age variable by 10-year bin considering the distribution of the events in a new group variable (death) which I named 'agecat'.

SteveDenham · Posted 04-07-2020 07:51 AM

@Cruise The categorization you came up with certainly seems workable, as you get results. Are these results interpretable, i.e. do they make sense clinically/biologically? And you should probably make sure that the proportional hazards assumption is at least approximate.

SteveDenham

Cruise · Posted 04-07-2020 07:55 AM

Thank you chiming in. I really appreciate it. I agree that the assumption needed to be assessed first. I found this SUGI paper to test the PH assumption and should follow its suggestions. https://www.pharmasug.org/proceedings/china2018/SP/Pharmasug-China-2018-SP75.pdf

Cox proportional HM - missing data

Re: Cox proportional HM - missing data

Re: Cox proportional HM - missing data

Re: Cox proportional HM - missing data

Re: Cox proportional HM - missing data

Re: Cox proportional HM - missing data

Re: Cox proportional HM - missing data

Catch up on SAS Innovate 2026