# STRATA vs. BY

Hi everyone;

I am working with PROC PHREG with a big data set of patients across 5 years FY07 - 2011. There are many class factors there such as In/Out Patients and Gender, etc. I would like to apply PH Cox regression for these data but get confused whether a BY statement or STRATA statements is appropriate, since I don't exactly know the main difference between them. Any help would be so appreciated.

Thanks!

Issac

‎07-31-2012 07:07 PM
## Re: STRATA vs. BY

When you use STRATA or BY you are essentially creating a different model for each level in that variable.

IE if you stratify by SEX then you'll have different models for females and males.

BY operates the same way except the output looks a little different.

Usually I don't use either in PROC PHREG, but a CLASS statement is if you have class factors.

## Re: STRATA vs. BY

Thanks so much Reeza!

## Re: STRATA vs. BY

Issac,

The difference between STRATA and BY is much greater than Reeza said.  With the BY statement, each analysis is completely independent.  With the STRATA statement, the variable(s) used for the stratification are still part of the model and impact the error term.  STRATA is useful when a class predictor variable does not meet the proportional hazards assumptions.  You can still use the strata variable to test interactions with other variables that are in the model.  I've used it with treatments that were important to the model but did meet the assumptions.

There are other uses.  I'd suggest looking at Paul Allison's BBU on Survival Analysis.

Doc Muhlbaier

Duke

## Re: STRATA vs. BY

Duke,

Have reviewed Allison book, it is suggested do not include a covariate both in strata and model statements since they carry a same meaning. I tried out this, and the output does not create MLE analysis for the strata variable. But let me ask you something. When we stratify on, let's say marital status, it means that proportionality is not satisfied on the whole sample but it is satisfied within each group (divorced , married) of patients, and we can interpret the whole data as stratified sampling of patients. Is that correct?

Thanks!

## Re: STRATA vs. BY

Allison's book is correct, but it can go further.  Say treatment does not satisfy proportional hazards (and thus needs to be a STRATA variable), and age is in the model because it does.  You can get a valid test of treatment*age in that situation, even though you cannot get an overall test of treatment.

I don't interpret the use of STRATA as in your question.  Proportional hazards is satisfied (or not) by predictor variables in the model and is not a characteristic of the model as a whole.

Doc

## Re: STRATA vs. BY

I got it more precisely now. Thanks Duke!

