BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MFraga
Quartz | Level 8

Hello,

 

I am trying to understand why my regressions give me different results when I run the models in SAS or STATA. I am a new SAS user and I am trying to do survival analysis for the first time with SAS.

 

At this point, I have to produce Cox regressions and as I do not know very much abour SAS, I compare the results in SAS with my results from STATA to make sure that I have done the good thing.

 

An example of my dataset would be:

 

data have;

input

 

id time1 event1 weight independent_v1 independent_v2;

datalines;

1 0 0 0.8 0 1

1 1 0 0.8 0 1

1 2 0 0.8 0 0

1 3 0 0.8 0 0

1 4 0 0.8 0 1

1 5 0 0.8 0 1

1 6 0 0.8 0 1

1 7 0 0.8 0 1

1 8 0 0.8 0 2

1 9 0 0.8 0 2

1 10 0 0.8 0 2

1 11 0 0.8 0 2

1 12 0 0.8 0 2

1 13 0 0.8 0 2

2 0 0 1.1 1 0

2 1 1 1.1 1 0

2 2 . 1.1 1 0

3 0 0 1.01 2 1

3 1 0 1.01 2 1

3 2 1 1.01 2 1

3 3 . 1.01 2 1

4 0 1 0.98 2 1

4 1 . 0.98 2 1

4 2 . 0.98 2 1

4 3 . 0.98 2 1

4 4 . 0.98 2 1

5 0 0 1.13 3 0

6 0 0 1.05 3 0

6 1 0 1.05 3 1

6 2 0 1.05 3 1

6 3 0 1.05 3 1

6 4 0 1.05 3 1

6 5 1 1.05 3 1

6 6 . 1.05 3 1

6 7 . 1.05 1 1

6 8 . 1.05 1 1

7 0 0 0.89 0 3

7 1 0 0.89 0 3

7 2 0 0.89 0 3

7 3 0 0.89 0 3

7 4 0 0.89 0 3

7 5 0 0.89 0 3

7 6 0 0.89 0 3

7 7 0 0.89 0 3

7 8 1 0.89 0 1

7 9 . 0.89 0 1

7 10 . 0.89 0 1

8 0 0 1.1 1 0

8 1 0 1.1 1 1

8 2 0 1.1 1 1

8 3 . 1.1 1 2

8 4 . 1.1 1 2

;

run;

 

 

As you can see, I have time-invariant and time-varying covariates and my dataset is arranged in a longitudinal way.

 

This is what I code in SAS for the same table "have":

 

proc phreg data = have;
class independent_v1(ref='0') independent_v2(ref='0');
id id;
model time1*event1(0) = independent_v1 independent_v2 / rl;
weight weight;
run;

 

 

This is what I code in STATA for the same table "have":

 

stset time1 [pweight=weight], id(id) failure(event1=1)

char independent_v1[omit] 0

char independent_v2[omit] 0

xi: stcox i.independent_v1 i.independent_v2

 

Does anyone have any clue ?

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @MFraga,

 

I've never used STATA, but for PROC PHREG you cannot use your dataset HAVE as the input dataset without further preparations.

 

You need

  1. either a dataset with one observation per subject and a time variable indicating when the event of interest occurred or the subject was censored or
  2. a dataset with (one or) multiple observations per subject each of which describes a time interval. In this case two time variables indicate the start and end point of the interval (t1, t2]. This second option is referred to as Counting Process Style of Input in the documentation.

In the first case, time-dependent covariates can be defined using programming statements (similar to DATA step statements) in the PROC PHREG step. For your data the second option is more suitable because the changes of independent_v2 don't follow a simple pattern. Please see Example 87.7 Time-Dependent Repeated Measurements of a Covariate in the documentation and create a modified input dataset from dataset HAVE correspondingly.

 

Edit: Using the second option time-dependent covariates are allowed as CLASS variables.

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @MFraga,

 

I've never used STATA, but for PROC PHREG you cannot use your dataset HAVE as the input dataset without further preparations.

 

You need

  1. either a dataset with one observation per subject and a time variable indicating when the event of interest occurred or the subject was censored or
  2. a dataset with (one or) multiple observations per subject each of which describes a time interval. In this case two time variables indicate the start and end point of the interval (t1, t2]. This second option is referred to as Counting Process Style of Input in the documentation.

In the first case, time-dependent covariates can be defined using programming statements (similar to DATA step statements) in the PROC PHREG step. For your data the second option is more suitable because the changes of independent_v2 don't follow a simple pattern. Please see Example 87.7 Time-Dependent Repeated Measurements of a Covariate in the documentation and create a modified input dataset from dataset HAVE correspondingly.

 

Edit: Using the second option time-dependent covariates are allowed as CLASS variables.

MFraga
Quartz | Level 8

I ran my regressions again and the results were not equal, but satisfactorily close. I think this difference may be due to the "weight" variable in the way it is used by SAS and STATA.

 

Again, thanks again for the great text you sent me. It is really clear about how the dataset must be organized.

 

Best!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2513 views
  • 1 like
  • 2 in conversation