Hello,
I am trying to understand why my regressions give me different results when I run the models in SAS or STATA. I am a new SAS user and I am trying to do survival analysis for the first time with SAS.
At this point, I have to produce Cox regressions and as I do not know very much abour SAS, I compare the results in SAS with my results from STATA to make sure that I have done the good thing.
An example of my dataset would be:
data have;
input
id time1 event1 weight independent_v1 independent_v2;
datalines;
1 0 0 0.8 0 1
1 1 0 0.8 0 1
1 2 0 0.8 0 0
1 3 0 0.8 0 0
1 4 0 0.8 0 1
1 5 0 0.8 0 1
1 6 0 0.8 0 1
1 7 0 0.8 0 1
1 8 0 0.8 0 2
1 9 0 0.8 0 2
1 10 0 0.8 0 2
1 11 0 0.8 0 2
1 12 0 0.8 0 2
1 13 0 0.8 0 2
2 0 0 1.1 1 0
2 1 1 1.1 1 0
2 2 . 1.1 1 0
3 0 0 1.01 2 1
3 1 0 1.01 2 1
3 2 1 1.01 2 1
3 3 . 1.01 2 1
4 0 1 0.98 2 1
4 1 . 0.98 2 1
4 2 . 0.98 2 1
4 3 . 0.98 2 1
4 4 . 0.98 2 1
5 0 0 1.13 3 0
6 0 0 1.05 3 0
6 1 0 1.05 3 1
6 2 0 1.05 3 1
6 3 0 1.05 3 1
6 4 0 1.05 3 1
6 5 1 1.05 3 1
6 6 . 1.05 3 1
6 7 . 1.05 1 1
6 8 . 1.05 1 1
7 0 0 0.89 0 3
7 1 0 0.89 0 3
7 2 0 0.89 0 3
7 3 0 0.89 0 3
7 4 0 0.89 0 3
7 5 0 0.89 0 3
7 6 0 0.89 0 3
7 7 0 0.89 0 3
7 8 1 0.89 0 1
7 9 . 0.89 0 1
7 10 . 0.89 0 1
8 0 0 1.1 1 0
8 1 0 1.1 1 1
8 2 0 1.1 1 1
8 3 . 1.1 1 2
8 4 . 1.1 1 2
;
run;
As you can see, I have time-invariant and time-varying covariates and my dataset is arranged in a longitudinal way.
This is what I code in SAS for the same table "have":
proc phreg data = have;
class independent_v1(ref='0') independent_v2(ref='0');
id id;
model time1*event1(0) = independent_v1 independent_v2 / rl;
weight weight;
run;
This is what I code in STATA for the same table "have":
stset time1 [pweight=weight], id(id) failure(event1=1)
char independent_v1[omit] 0
char independent_v2[omit] 0
xi: stcox i.independent_v1 i.independent_v2
Does anyone have any clue ?
Hello @MFraga,
I've never used STATA, but for PROC PHREG you cannot use your dataset HAVE as the input dataset without further preparations.
You need
In the first case, time-dependent covariates can be defined using programming statements (similar to DATA step statements) in the PROC PHREG step. For your data the second option is more suitable because the changes of independent_v2 don't follow a simple pattern. Please see Example 87.7 Time-Dependent Repeated Measurements of a Covariate in the documentation and create a modified input dataset from dataset HAVE correspondingly.
Edit: Using the second option time-dependent covariates are allowed as CLASS variables.
Hello @MFraga,
I've never used STATA, but for PROC PHREG you cannot use your dataset HAVE as the input dataset without further preparations.
You need
In the first case, time-dependent covariates can be defined using programming statements (similar to DATA step statements) in the PROC PHREG step. For your data the second option is more suitable because the changes of independent_v2 don't follow a simple pattern. Please see Example 87.7 Time-Dependent Repeated Measurements of a Covariate in the documentation and create a modified input dataset from dataset HAVE correspondingly.
Edit: Using the second option time-dependent covariates are allowed as CLASS variables.
I ran my regressions again and the results were not equal, but satisfactorily close. I think this difference may be due to the "weight" variable in the way it is used by SAS and STATA.
Again, thanks again for the great text you sent me. It is really clear about how the dataset must be organized.
Best!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.