Hi,
I have the following dataset, where the main interests are ID (refers to patient ID number) and date_start (refers to start day of treatment):
Row | id | atc | date_start | date_stop |
1 | 1 | A | 2010-07-02 | 2011-08-17 |
2 | 1 | B | 2011-08-17 | 2011-09-09 |
3 | 1 | C | 2011-09-09 | 2011-11-10 |
4 | 1 | D | 2011-11-14 | 2012-02-08 |
5 | 2 | E | 2007-03-17 | 2009-07-01 |
6 | 2 | F | 2007-06-29 | 2009-11-12 |
7 | 2 | A | 2007-07-17 | 2010-06-22 |
8 | 2 | B | 2007-08-14 | 2010-11-04 |
9 | 2 | A | 2007-09-04 | 2011-02-17 |
10 | 2 | D | 2008-02-18 | 2011-06-20 |
11 | 2 | F | 2008-03-21 | 2011-08-19 |
12 | 2 | A | 2008-04-15 | 2011-09-28 |
13 | 2 | B | 2008-04-29 | 2011-10-18 |
14 | 2 | K | 2008-05-15 | 2011-12-01 |
I would like to create a new numeric variable called LINE (corresponds to the line of treatment). The LINE should be counted within the group of ID, but the number of lines are only counted if the date_start is higher than the first date_start + 60 within the LINE variable.
This is the desired output:
To exemplify what I would like to achieve please note: in row number 10-12 the line number is 4 because both the row 11 and 12 is less then date_start+60 of the row 10. But in row 13 the line is 5 because its date_start is more than date_start+60 of the row 10.
Row | id | atc | date_start | date_stop | Line |
1 | 1 | A | 2010-07-02 | 2011-08-17 | 1 |
2 | 1 | B | 2011-08-17 | 2011-09-09 | 2 |
3 | 1 | C | 2011-09-09 | 2011-11-10 | 2 |
4 | 1 | D | 2011-11-14 | 2012-02-08 | 3 |
5 | 2 | E | 2007-03-17 | 2009-07-01 | 1 |
6 | 2 | F | 2007-06-29 | 2009-11-12 | 2 |
7 | 2 | A | 2007-07-17 | 2010-06-22 | 2 |
8 | 2 | B | 2007-08-14 | 2010-11-04 | 2 |
9 | 2 | A | 2007-09-04 | 2011-02-17 | 3 |
10 | 2 | D | 2008-02-18 | 2011-06-20 | 4 |
11 | 2 | F | 2008-03-21 | 2011-08-19 | 4 |
12 | 2 | A | 2008-04-15 | 2011-09-28 | 4 |
13 | 2 | B | 2008-04-29 | 2011-10-18 | 5 |
14 | 2 | K | 2008-05-15 | 2011-12-01 | 5 |
Do you have any suggestion how this can be done?
SAS 9.3
Thanks,
Gib
A similar approach, but with a slightly different interpretation of the requirements:
data want;
set have;
by id;
if first.id then do;
line=1;
baseline_date = date_start;
end;
else if date_start - baseline_date > 60 then do;
line + 1;
baseline_date = date_start;
end;
retain baseline_date;
drop baseline_date;
run;
I have to guess by +60 you mean 60 days. since 60 could be weeks, months, years, ...
/* assumes your data set is sorted by Id and date_start AND that date_start is a SAS date value and not character */ data want; set have; by id; retain line .; lds = lag(date_start); if first.id then line=1; else if start_date > (lds+60) then line+1;
drop lds; run;
If your dates aren't numeric SAS date values then you need to create some.
Hi,
Thanks for your reply. Yes, you assumed correctly, I meant 60 days by 60.
The suggested code does not work unfortunately, because it only takes into account the previous observation with the lag function. The code gives a value of 4 of LINE variable in the row 13, when it should be 5 because when comparing the date_start + 60 the reference should be the first date_start of the LINE variable.
Do you have an other suggestion maybe?
Row | id | atc | date_start | date_stop | Line |
1 | 1 | A | 2010-07-02 | 2011-08-17 | 1 |
2 | 1 | B | 2011-08-17 | 2011-09-09 | 2 |
3 | 1 | C | 2011-09-09 | 2011-11-10 | 2 |
4 | 1 | D | 2011-11-14 | 2012-02-08 | 3 |
5 | 2 | E | 2007-03-17 | 2009-07-01 | 1 |
6 | 2 | F | 2007-06-29 | 2009-11-12 | 2 |
7 | 2 | A | 2007-07-17 | 2010-06-22 | 2 |
8 | 2 | B | 2007-08-14 | 2010-11-04 | 2 |
9 | 2 | A | 2007-09-04 | 2011-02-17 | 3 |
10 | 2 | D | 2008-02-18 | 2011-06-20 | 4 |
11 | 2 | F | 2008-03-21 | 2011-08-19 | 4 |
12 | 2 | A | 2008-04-15 | 2011-09-28 | 4 |
13 | 2 | B | 2008-04-29 | 2011-10-18 | 5 |
14 | 2 | K | 2008-05-15 | 2011-12-01 | 5 |
Thanks,
Gib
A similar approach, but with a slightly different interpretation of the requirements:
data want;
set have;
by id;
if first.id then do;
line=1;
baseline_date = date_start;
end;
else if date_start - baseline_date > 60 then do;
line + 1;
baseline_date = date_start;
end;
retain baseline_date;
drop baseline_date;
run;
To be honest I do not understand the logic behind the code but it works as desired, thank you!
All of the techniques and tools used here are frequently used, and well worth learning if you don't know them:
The effect of a BY statement within a DATA step
RETAIN statement
DROP statement
How SAS stores dates
Sum statement (such as line + 1; )
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.