Solved: Re: Kaplan-Meier longitudinal dataset

MFraga · Posted 10-18-2018 02:08 PM

Hello,

I need to create the Kaplan-Meier curves to analysis, but I am finding inconsistencies when I compare with my results with Stata. I have export my data via "Stat/transfer" and then produce curves with Stata. In Stata, things look good, but I want to solve this problem and keep using SAS.

My dataset is in An example of my dataset is arranged longitudinally. An example of my dataset would be:

data have;

input

id time1 event1 weight;

datalines;

1 0 0 0.8

1 1 0 0.8

1 2 0 0.8

1 3 0 0.8

1 4 0 0.8

15 0 0.8

1 6 0 0.8

1 7 0 0.8

1 8 0 0.8

1 9 0 0.8

1 10 0 0.8

1 11 0 0.8

1 12 0 0.8

1 13 0 0.8

2 0 0 1.1

2 1 1 1.1

2 2 . 1.1

3 0 0 1.01

3 1 0 1.01

3 2 1 1.01

3 3 . 1.01

4 0 1 0.98

4 1 . 0.98

4 2 . 0.98

4 3 . 0.98

4 4 . 0.98

5 0 0 1.13

6 0 0 1.05

6 1 0 1.05

6 2 0 1.05

6 3 0 1.05

6 4 0 1.05

6 5 1 1.05

6 6 . 1.05

6 7 . 1.05

6 8 . 1.05

7 0 0 0.89

7 1 0 0.89

7 2 0 0.89

7 3 0 0.89

7 4 0 0.89

7 5 0 0.89

7 6 0 0.89

7 7 0 0.89

7 8 1 0.89

7 9 . 0.89

7 10 . 0.89

8 0 0 1.1

8 1 0 1.1

8 2 0 1.1

8 3 . 1.1

8 4 . 1.1

;

run;

So I run the survival analysis like that:

proc lifetest data=have plots(s) graphics notable;

time time1*event1(0);

weight weight;

run;

My resulting graphic does not have the same proportion like in STATA when I use the same table coding like that to produce the survival curve:

stset time1 [pweight=weight], id(id) failure(event1=1)

sts graph

Does anyone know how I make SAS understand that my dataset is arranged longitudinally and control by the "id" the analysis that I want? Many thanks in advance!

Reeza · Posted 10-18-2018 03:38 PM

Which part do you think will be time consuming?

Here's an example of how you can filter your list.

@MFraga wrote:

Yes, I know that this could be a possibility, but it will be time consuming. I think I will move to Stata for the analysis. Thanks anyway.

View solution in original post

ballardw · Posted 10-18-2018 02:21 PM

I don't speak STATA but you apparently are using the ID variable in the STATA but not in the proc lifetest. What role would that play in stata?

also your lifetest code as posted does not run (at least on my system).

This ran for me:

proc lifetest data=have plots=(s)  notable;
   time time1*event1(0);
   weight weight;
run;

MFraga · Posted 10-18-2018 02:35 PM

THanks for you answer. ID is one individual. My interest variable is event1. The time variable is time1. I want to understand the time that it takes to have an event1 for each individual (each ID). Does this information help ?

Reeza · Posted 10-18-2018 03:20 PM

You need to reduce it to a single line for each individual, so either the event1=1 record or the last record for each ID.

Then you can run PROC LIFETEST and probably get the same results. You can check an example in the documentation for how the data set needs to be structured.

https://documentation.sas.com/api/docsets/statug/14.3/content/statug_code_liftex2.htm?locale=en

@MFraga wrote:

Hello,

I need to create the Kaplan-Meier curves to analysis, but I am finding inconsistencies when I compare with my results with Stata. I have export my data via "Stat/transfer" and then produce curves with Stata. In Stata, things look good, but I want to solve this problem and keep using SAS.

My dataset is in An example of my dataset is arranged longitudinally. An example of my dataset would be:

data have;

input

id time1 event1 weight;

datalines;

1 0 0 0.8

1 1 0 0.8

1 2 0 0.8

1 3 0 0.8

1 4 0 0.8

15 0 0.8

1 6 0 0.8

1 7 0 0.8

1 8 0 0.8

1 9 0 0.8

1 10 0 0.8

1 11 0 0.8

1 12 0 0.8

1 13 0 0.8

2 0 0 1.1

2 1 1 1.1

2 2 . 1.1

3 0 0 1.01

3 1 0 1.01

3 2 1 1.01

3 3 . 1.01

4 0 1 0.98

4 1 . 0.98

4 2 . 0.98

4 3 . 0.98

4 4 . 0.98

5 0 0 1.13

6 0 0 1.05

6 1 0 1.05

6 2 0 1.05

6 3 0 1.05

6 4 0 1.05

6 5 1 1.05

6 6 . 1.05

6 7 . 1.05

6 8 . 1.05

7 0 0 0.89

7 1 0 0.89

7 2 0 0.89

7 3 0 0.89

7 4 0 0.89

7 5 0 0.89

7 6 0 0.89

7 7 0 0.89

7 8 1 0.89

7 9 . 0.89

7 10 . 0.89

8 0 0 1.1

8 1 0 1.1

8 2 0 1.1

8 3 . 1.1

8 4 . 1.1

;

run;

So I run the survival analysis like that:

proc lifetest data=have plots(s) graphics notable;

time time1*event1(0);

weight weight;

run;

My resulting graphic does not have the same proportion like in STATA when I use the same table coding like that to produce the survival curve:

stset time1 [pweight=weight], id(id) failure(event1=1)

sts graph

Does anyone know how I make SAS understand that my dataset is arranged longitudinally and control by the "id" the analysis that I want? Many thanks in advance!

MFraga · Posted 10-18-2018 03:25 PM

Yes, I know that this could be a possibility, but it will be time consuming. I think I will move to Stata for the analysis. Thanks anyway.

Reeza · Posted 10-18-2018 03:38 PM

Which part do you think will be time consuming?

Here's an example of how you can filter your list.

@MFraga wrote:

Yes, I know that this could be a possibility, but it will be time consuming. I think I will move to Stata for the analysis. Thanks anyway.

mkeintz · Posted 10-18-2018 06:25 PM

I think you need to sort by ID TIME1, to guarantee that the last record for any ID without an event will have the latest time value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Reeza · Posted 10-18-2018 06:27 PM

Yeah, I had modified the data slightly for testing because there's only 8 records otherwise....

MFraga · Posted 10-19-2018 11:14 AM

It is already sorted by id.

mkeintz · Posted 10-19-2018 12:44 PM

@MFraga

The initial proc sort is often provided by forum repondents just to demonstrate required data order for the subsequent (more interesting and relevant) steps. Often a person starting a topic may show ordered sample data for convenience, only to discover problems when the actual (unsorted) data is used.

If the real data are sorted (by ID and TIME1) then by all means drop the proc sort. Just honor the Socratic dictum: know thy data.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Reeza · Posted 10-18-2018 06:15 PM

Not sure why the code didn't post before, only have an image now though.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away