Solved: Re: Difference in STATA and SAS results for Cox regression

MFraga · Posted 10-19-2018 05:06 PM

Hello,

I am trying to understand why my regressions give me different results when I run the models in SAS or STATA. I am a new SAS user and I am trying to do survival analysis for the first time with SAS.

At this point, I have to produce Cox regressions and as I do not know very much abour SAS, I compare the results in SAS with my results from STATA to make sure that I have done the good thing.

An example of my dataset would be:

data have;

input

id time1 event1 weight independent_v1 independent_v2;

datalines;

1 0 0 0.8 0 1

1 1 0 0.8 0 1

1 2 0 0.8 0 0

1 3 0 0.8 0 0

1 4 0 0.8 0 1

1 5 0 0.8 0 1

1 6 0 0.8 0 1

1 7 0 0.8 0 1

1 8 0 0.8 0 2

1 9 0 0.8 0 2

1 10 0 0.8 0 2

1 11 0 0.8 0 2

1 12 0 0.8 0 2

1 13 0 0.8 0 2

2 0 0 1.1 1 0

2 1 1 1.1 1 0

2 2 . 1.1 1 0

3 0 0 1.01 2 1

3 1 0 1.01 2 1

3 2 1 1.01 2 1

3 3 . 1.01 2 1

4 0 1 0.98 2 1

4 1 . 0.98 2 1

4 2 . 0.98 2 1

4 3 . 0.98 2 1

4 4 . 0.98 2 1

5 0 0 1.13 3 0

6 0 0 1.05 3 0

6 1 0 1.05 3 1

6 2 0 1.05 3 1

6 3 0 1.05 3 1

6 4 0 1.05 3 1

6 5 1 1.05 3 1

6 6 . 1.05 3 1

6 7 . 1.05 1 1

6 8 . 1.05 1 1

7 0 0 0.89 0 3

7 1 0 0.89 0 3

7 2 0 0.89 0 3

7 3 0 0.89 0 3

7 4 0 0.89 0 3

7 5 0 0.89 0 3

7 6 0 0.89 0 3

7 7 0 0.89 0 3

7 8 1 0.89 0 1

7 9 . 0.89 0 1

7 10 . 0.89 0 1

8 0 0 1.1 1 0

8 1 0 1.1 1 1

8 2 0 1.1 1 1

8 3 . 1.1 1 2

8 4 . 1.1 1 2

;

run;

As you can see, I have time-invariant and time-varying covariates and my dataset is arranged in a longitudinal way.

This is what I code in SAS for the same table "have":

proc phreg data = have;
class independent_v1(ref='0') independent_v2(ref='0');
id id;
model time1*event1(0) = independent_v1 independent_v2 / rl;
weight weight;
run;

This is what I code in STATA for the same table "have":

stset time1 [pweight=weight], id(id) failure(event1=1)

char independent_v1[omit] 0

char independent_v2[omit] 0

xi: stcox i.independent_v1 i.independent_v2

Does anyone have any clue ?

FreelanceReinh · Posted 10-20-2018 11:31 AM

Hello @MFraga,

I've never used STATA, but for PROC PHREG you cannot use your dataset HAVE as the input dataset without further preparations.

You need

either a dataset with one observation per subject and a time variable indicating when the event of interest occurred or the subject was censored or
a dataset with (one or) multiple observations per subject each of which describes a time interval. In this case two time variables indicate the start and end point of the interval (t1, t2]. This second option is referred to as Counting Process Style of Input in the documentation.

In the first case, time-dependent covariates can be defined using programming statements (similar to DATA step statements) in the PROC PHREG step. For your data the second option is more suitable because the changes of independent_v2 don't follow a simple pattern. Please see Example 87.7 Time-Dependent Repeated Measurements of a Covariate in the documentation and create a modified input dataset from dataset HAVE correspondingly.

Edit: Using the second option time-dependent covariates are allowed as CLASS variables.

View solution in original post

FreelanceReinh · Posted 10-20-2018 11:31 AM

Hello @MFraga,

I've never used STATA, but for PROC PHREG you cannot use your dataset HAVE as the input dataset without further preparations.

You need

either a dataset with one observation per subject and a time variable indicating when the event of interest occurred or the subject was censored or
a dataset with (one or) multiple observations per subject each of which describes a time interval. In this case two time variables indicate the start and end point of the interval (t1, t2]. This second option is referred to as Counting Process Style of Input in the documentation.

In the first case, time-dependent covariates can be defined using programming statements (similar to DATA step statements) in the PROC PHREG step. For your data the second option is more suitable because the changes of independent_v2 don't follow a simple pattern. Please see Example 87.7 Time-Dependent Repeated Measurements of a Covariate in the documentation and create a modified input dataset from dataset HAVE correspondingly.

Edit: Using the second option time-dependent covariates are allowed as CLASS variables.

MFraga · Posted 10-22-2018 03:40 PM

I ran my regressions again and the results were not equal, but satisfactorily close. I think this difference may be due to the "weight" variable in the way it is used by SAS and STATA.

Again, thanks again for the great text you sent me. It is really clear about how the dataset must be organized.

Best!