BookmarkSubscribeRSS Feed
FreelanceReinh
Jade | Level 19

Hi @Sophie87,

 

Sorry for the late reply. I was just wondering if you've resolved the issue.

 

I think @unison's first response already provided the reason why the two definitions in your initial post lead to very different results: Whenever  . < daystorept2 < daystochcntc (i.e., the difference is negative), the time-dependent variable is defined to be 1 by the first definition, but 0 by the second.


@Sophie87 wrote:

I checked frequencies of the two variables and they are exactly the same.


I'm sure that you "checked" the frequencies after you computed the time-dependent variable in a DATA (or PROC SQL) step where you applied either definition to each of the observations in dataset ag16g (i.e., the input dataset of your PROC PHREG step). Then it is not surprising that the frequencies didn't differ because most likely there are no cases with daystorept2 < daystochcntc in your data (since, I assume, the "contact" simply cannot occur after the "event" time, nor after censoring). At least this is true for the sample data you've posted.

 

But this "check" ignores that the computation of the partial likelihood function involves evaluations of the time-dependent variable at each event time (and this is what PROC PHREG does with those "programming statements" after the MODEL statement)! The algorithm "looks back" at the event time of a different subject (who experienced their event earlier after "time zero" in the study) and computes the time-dependent variable for the subject at hand (using their "days to contact"). Of course, that other event might very well have happened "before" the contact date of the subject at hand. Hence, a negative difference does occur and this is where the difference between your two definitions becomes relevant.

 

It's probably not easy to notice that intricacy by a quick look at the formulas in the PROC PHREG documentation (linked above), but it's explained very well in Paul Allison's Survival Analysis Using SAS®: A Practical Guide (p. 140 f. in my older, first edition).

 

unison's suggestion of "pulling" the definition of the time-dependent explanatory variable out of the PROC PHREG step could be implemented, but would require a different structure of the input dataset (see Counting Process Style of Input) -- and should yield exactly the same results as your current code, so you really don't need to do that.

 

Now the question arises: Which of the two definitions (if any) is correct? Without deeper knowledge of your study I would think of those "contacts" as occurrences which may have an impact on the subjects' hazard of experiencing the target event. After all, this is exactly why you would consider them in the Cox model. I think your definitions and sample data suggest that you expect the "contact" to modify the hazard function only during a 30-day period after the contact (which sounds plausible). However, wouldn't this imply that before the contact the impact of the contact should be zero (like in cases where no contact occurred, i.e. daystochcntc=.), hence that the second definition should be preferred? But unfortunately this is the one whose resulting hazard ratio "totally did not make sense," as you wrote. Of course, knowing your study much better than me, you may arrive at a different conclusion.

 

I don't know, for example, how to interpret the multiple occurrences of some CASEIDNOs in your sample dataset (maybe clustered data?). Nevertheless I hope that the above explanations are helpful.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 1161 views
  • 1 like
  • 3 in conversation