In the matched case-control or cohort study, Should the matched variables be ignored in the COX regression modelling?
In the matched-pairs cohort (eg. matching "AGE" variable), the distribution of AGE is the same in exposed and unexposed cohorts.
During cox regression modelling, should we ignore the matching variable (AGE)?
I have a match_id for indicating the matched pair. Should I use the "STRATA" or "ID" statement for indicating specific matched pairs? Is any different of these two statement for matched-pairs cohort analysis?
Thanks!
Thank you for this interesting question, even though it is not specific SAS-relevant.
In a matchet case-control study you can not ignore the matching variables. If you have m:n mathcing (forexample 5 controls per case) then data is collected in a specific way that enforce you to make a conditioinal logistic regression. The likelihood in a conditional logistic regression is similar to the one from a stratified Cox-regression (but is not a Cox-regression!). You do not have that equality between likelihood functions if you ignore the matching variables in the strata-statement.
In a matched cohort study you can sometimes ignore the strata variable. Here you have time to event measured from a matching-date. The difference is that controls here mathed to an exposod person, not on the outcome. If you have m:n matchng, and you succeed to find m control persons for each group of controls, then you can ignore the matching variable. That is because in that case, you have (by design), removed the association between exposure and the mathcing variables in your studypopulation, therefore there can not be any confounder effect. Which means you (in theory) will get same result whether you include your matching variables or not in the model. If you did not have same number of controls per exposure, then you can still have confounder effect left, which means you have to include the matching variables in the strata statement.
you should use the STRATA statement. ID is used if you want to calculate a sandwich variance estimater which account for dependencies between individuals.
Good luck.
Thank you for this interesting question, even though it is not specific SAS-relevant.
In a matchet case-control study you can not ignore the matching variables. If you have m:n mathcing (forexample 5 controls per case) then data is collected in a specific way that enforce you to make a conditioinal logistic regression. The likelihood in a conditional logistic regression is similar to the one from a stratified Cox-regression (but is not a Cox-regression!). You do not have that equality between likelihood functions if you ignore the matching variables in the strata-statement.
In a matched cohort study you can sometimes ignore the strata variable. Here you have time to event measured from a matching-date. The difference is that controls here mathed to an exposod person, not on the outcome. If you have m:n matchng, and you succeed to find m control persons for each group of controls, then you can ignore the matching variable. That is because in that case, you have (by design), removed the association between exposure and the mathcing variables in your studypopulation, therefore there can not be any confounder effect. Which means you (in theory) will get same result whether you include your matching variables or not in the model. If you did not have same number of controls per exposure, then you can still have confounder effect left, which means you have to include the matching variables in the strata statement.
you should use the STRATA statement. ID is used if you want to calculate a sandwich variance estimater which account for dependencies between individuals.
Good luck.
Hi, I have a question regarding this topic. This was a good explanation on this topic, but I'm still confused.
I have matched retrospective cohort data. The data was retrospective, meaning that the "cases" occurred in the past and are in a database (no follow up). An epidemiologist decided to match exposures to non-exposures (always 1 : 5) and make this a retrospective cohort. I used PROC PRINT to count pairs, populate a matched 2x2 table, and caculate risk ratios. Then, I confirmed my results with PHREG using a strata statement (DNBI = case (0,1), DNBI12 = case (2,1) waiver (0,1) = exposure, ID = each matched pair of 6 (1 exposure, 5 non-exposures)):
proc phreg data=comb nosummary;
strata id;
class waiver sex;
model dnbi12*dnbi(0) = waiver /ties=breslow risklimits ;
run;
I get the same results with and without the strata statement. Now, exposures were matched to unexposed on sex. So, when I throw sex into the model and code above
model dnbi12*dnbi(0) = waiver sex /ties=breslow risklimits ;
I get no answer for the effect of sex on dnbi. However, with the strata statement removed (ignoring the strata variable), I get a signficant result for sex.
I found this on the web: "If matching was done appropriately, and matching is not taken into account in the analysis, the OR will be biased towards the null." Therefore, I interpret this as, if I ignore the strata variable, the OR (or hazard, in this case) will be biased towards 1.0. If it is still significant, then we can be assured of a signficant effect of sex on dnbi, but the effect is biased and lower than the true effect. Is this an appropriate interpretation?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.