Hi SAS community,
I hope you are doing well. When conducting survival analysis, I used two ways to define the endpoint of the study period:
1. Individuals who are diagnosed with depression during the study period can be defined as the endpoint.
2. Alternatively, individuals who do not have depression at the end of the study period can be defined as the endpoint.
I was wondering how I could calculate my survival time.
Thanks for all your help!
This is original data:
| idauniq | depression | wave |
| 100052 | 0 | 3 |
| 100052 | 0 | 4 |
| 100052 | 0 | 5 |
| 100052 | 0 | 6 |
| 100052 | 0 | 7 |
| 100052 | 0 | 8 |
| 100052 | 0 | 9 |
| 100055 | 0 | 3 |
| 100055 | 0 | 4 |
| 100055 | 0 | 5 |
| 100055 | 0 | 6 |
| 100055 | 0 | 7 |
| 100055 | 0 | 8 |
| 100057 | 0 | 3 |
| 100057 | 0 | 4 |
| 100057 | 0 | 5 |
| 100057 | 0 | 6 |
| 100057 | 0 | 7 |
| 100057 | 0 | 8 |
| 100057 | 0 | 9 |
| 100059 | 0 | 3 |
| 100059 | 0 | 4 |
| 100059 | 1 | 5 |
| 100061 | 0 | 3 |
| 100061 | 1 | 4 |
| 100061 | 0 | 5 |
| 100061 | 0 | 6 |
| 100068 | 0 | 3 |
| 100068 | 0 | 5 |
| 100068 | 0 | 6 |
| 100068 | 1 | 7 |
| 100080 | 0 | 3 |
| 100080 | 0 | 5 |
| 100081 | 0 | 3 |
| 100081 | 0 | 4 |
| 100081 | 0 | 5 |
| 100081 | 0 | 6 |
| 100081 | 1 | 7 |
This is results what is expect:
| idauniq | depression | wave | Time | censor |
| 100052 | 0 | 9 | 12 | 0 |
| 100055 | 0 | 8 | 10 | 0 |
| 100057 | 0 | 9 | 12 | 0 |
| 100059 | 1 | 5 | 4 | 1 |
| 100061 | 0 | 4 | 2 | 0 |
| 100068 | 1 | 7 | 8 | 1 |
| 100080 | 0 | 5 | 4 | 0 |
| 100081 | 1 | 7 | 8 | 1 |
You can follow a SET statement with
by idauniq ;
which allows you to determine if the observation-in-hand is the first (or last) obs for a given idauniq.
You will output one obs per idauniq. It will be either the last obs (if there are no preceding depression obs) or else the first obs with depressio=1:
data have;
input idauniq depression wave;
datalines;
100052 0 3
100052 0 4
100052 0 5
100052 0 6
100052 0 7
100052 0 8
100052 0 9
100055 0 3
100055 0 4
100055 0 5
100055 0 6
100055 0 7
100055 0 8
100057 0 3
100057 0 4
100057 0 5
100057 0 6
100057 0 7
100057 0 8
100057 0 9
100059 0 3
100059 0 4
100059 1 5
100061 0 3
100061 1 4
100061 0 5
100061 0 6
100068 0 3
100068 0 5
100068 0 6
100068 1 7
100080 0 3
100080 0 5
100081 0 3
100081 0 4
100081 0 5
100081 0 6
100081 1 7
run;
data want (drop=n_dep);
set have ;
by idauniq ;
if first.idauniq then n_dep=0;
n_dep+depression;
if (n_dep=1 and depression=1) or (n_dep=0 and last.idauniq=1);
time=2*(wave-3);
run;
And what is the rule by which you calculate the TIME variable?
For instance, for ID 100052, you start with
| idauniq | depression | wave |
| 100052 | 0 | 3 |
| 100052 | 0 | 4 |
| 100052 | 0 | 5 |
| 100052 | 0 | 6 |
| 100052 | 0 | 7 |
| 100052 | 0 | 8 |
| 100052 | 0 | 9 |
From that you get
| idauniq | depression | wave | Time | censor |
| 100052 | 0 | 9 | 12 | 0 |
How did you get time=12?
Also, if an individual has depression=1 in a given wave, does that mean you will ignore subsequent waves for that individual? For instance, see
| idauniq | depression | wave |
| 100061 | 0 | 3 |
| 100061 | 1 | 4 |
| 100061 | 0 | 5 |
| 100061 | 0 | 6 |
You can follow a SET statement with
by idauniq ;
which allows you to determine if the observation-in-hand is the first (or last) obs for a given idauniq.
You will output one obs per idauniq. It will be either the last obs (if there are no preceding depression obs) or else the first obs with depressio=1:
data have;
input idauniq depression wave;
datalines;
100052 0 3
100052 0 4
100052 0 5
100052 0 6
100052 0 7
100052 0 8
100052 0 9
100055 0 3
100055 0 4
100055 0 5
100055 0 6
100055 0 7
100055 0 8
100057 0 3
100057 0 4
100057 0 5
100057 0 6
100057 0 7
100057 0 8
100057 0 9
100059 0 3
100059 0 4
100059 1 5
100061 0 3
100061 1 4
100061 0 5
100061 0 6
100068 0 3
100068 0 5
100068 0 6
100068 1 7
100080 0 3
100080 0 5
100081 0 3
100081 0 4
100081 0 5
100081 0 6
100081 1 7
run;
data want (drop=n_dep);
set have ;
by idauniq ;
if first.idauniq then n_dep=0;
n_dep+depression;
if (n_dep=1 and depression=1) or (n_dep=0 and last.idauniq=1);
time=2*(wave-3);
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.