I have a long dataset with 1 to multiple (not fixed) rows per person_id. There are a number of variables but the ones of interest are person_id, start_date and end_date. The dataset is sorted according to person_id, start_date, end_date.
I have identified an indicator case/s (flag=1) for each person_id. This may not be the first case (based on start_date) and an person_id may have none or multiple indicator cases.
I would like to calculate the number of days between the end_date of the indicator case and each subsequent start_date until the next indicator within person_id or until next person_id.
The below table shows the number of days column I would like to compute.
person_id | start_date | end_date | flag | Number of days |
1 | 02AUG2020 | 05AUG2020 |
|
|
1 | 08AUG2020 | 20AUG2020 | 1 |
|
1 | 10AUG2020 | 30AUG2020 |
| -10 |
1 | 21AUG2020 | 21AUG2020 |
| 1 |
2 | 02AUG2020 | 05AUG2020 |
|
|
2 | 08AUG2020 | 10AUG2020 | 1 |
|
2 | 20AUG2020 | 21AUG2020 |
| 10 |
2 | 21AUG2020 | 21AUG2020 | 1 |
|
2 | 23AUG2020 | 31AUG2020 |
| 2 |
3 | 08AUG2020 | 10AUG2020 |
|
|
3 | 10AUG2020 | 21AUG2020 |
|
|
4 | 02AUG2020 | 05AUG2020 | 1 |
|
4 | 05AUG2020 | 21AUG2020 |
| 0 |
4 | 21AUG2020 | 24AUG2020 |
| 16 |
5 | 08AUG2020 | 10AUG2020 |
|
|
5 | 20AUG2020 | 21AUG2020 | 1 |
|
Thanks heaps in advance
I'm assuming:
data want;
set have;
by person_id start_date end_date;
*stores the end_date in the indicator start variable and carries it across the rows;
retain indicator_start;
if flag=1 then indicator_start = end_date;
*resets for the first record of each person;
if first.person_id then call missing(indicator_start);
*conditionally calculate duration;
if not missing(indicator_start) and flag ne 1 then num_days = end_date - indicator_start;
run;
Untested code, but should get you started. If it doesn't work, please post the code you used and the log and indicate exactly what did not work.
@JenWalker1 wrote:
I have a long dataset with 1 to multiple (not fixed) rows per person_id. There are a number of variables but the ones of interest are person_id, start_date and end_date. The dataset is sorted according to person_id, start_date, end_date.
I have identified an indicator case/s (flag=1) for each person_id. This may not be the first case (based on start_date) and an person_id may have none or multiple indicator cases.
I would like to calculate the number of days between the end_date of the indicator case and each subsequent start_date until the next indicator within person_id or until next person_id.
The below table shows the number of days column I would like to compute.
person_id
start_date
end_date
flag
Number of days
1
02AUG2020
05AUG2020
1
08AUG2020
20AUG2020
1
1
10AUG2020
30AUG2020
-10
1
21AUG2020
21AUG2020
1
2
02AUG2020
05AUG2020
2
08AUG2020
10AUG2020
1
2
20AUG2020
21AUG2020
10
2
21AUG2020
21AUG2020
1
2
23AUG2020
31AUG2020
2
3
08AUG2020
10AUG2020
3
10AUG2020
21AUG2020
4
02AUG2020
05AUG2020
1
4
05AUG2020
21AUG2020
0
4
21AUG2020
24AUG2020
16
5
08AUG2020
10AUG2020
5
20AUG2020
21AUG2020
1
Thanks heaps in advance
I'm assuming:
data want;
set have;
by person_id start_date end_date;
*stores the end_date in the indicator start variable and carries it across the rows;
retain indicator_start;
if flag=1 then indicator_start = end_date;
*resets for the first record of each person;
if first.person_id then call missing(indicator_start);
*conditionally calculate duration;
if not missing(indicator_start) and flag ne 1 then num_days = end_date - indicator_start;
run;
Untested code, but should get you started. If it doesn't work, please post the code you used and the log and indicate exactly what did not work.
@JenWalker1 wrote:
I have a long dataset with 1 to multiple (not fixed) rows per person_id. There are a number of variables but the ones of interest are person_id, start_date and end_date. The dataset is sorted according to person_id, start_date, end_date.
I have identified an indicator case/s (flag=1) for each person_id. This may not be the first case (based on start_date) and an person_id may have none or multiple indicator cases.
I would like to calculate the number of days between the end_date of the indicator case and each subsequent start_date until the next indicator within person_id or until next person_id.
The below table shows the number of days column I would like to compute.
person_id
start_date
end_date
flag
Number of days
1
02AUG2020
05AUG2020
1
08AUG2020
20AUG2020
1
1
10AUG2020
30AUG2020
-10
1
21AUG2020
21AUG2020
1
2
02AUG2020
05AUG2020
2
08AUG2020
10AUG2020
1
2
20AUG2020
21AUG2020
10
2
21AUG2020
21AUG2020
1
2
23AUG2020
31AUG2020
2
3
08AUG2020
10AUG2020
3
10AUG2020
21AUG2020
4
02AUG2020
05AUG2020
1
4
05AUG2020
21AUG2020
0
4
21AUG2020
24AUG2020
16
5
08AUG2020
10AUG2020
5
20AUG2020
21AUG2020
1
Thanks heaps in advance
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.