Dear all,
Below is an example of the dataset I am working with. I am trying to combine dates within a week of each other per ID. I have tried using lag and have been able to create a flag for records that have dates within a week of each other but am encountering an error when one person can have multiple records with multiple separate dates within a week (If that makes sense). I think I need to somehow create a unique ID again and cluster the data but I am a bit lost.
ID | Date | Test |
1 | Jan 1, 2005 | 0 |
1 | Jan 2, 2005 | 1 |
2 | Jan 1, 2005 | 0 |
2 | Jan 2, 2005 | 1 |
2 | Feb 1, 2005 | 1 |
2 | Feb 3, 2005 | 1 |
3 | March 2, 2005 | 1 |
3 | March 3, 2005 | 1 |
4 | April 10, 2006 | 1 |
My goal would be to have the end data (using max of tested and min of date), look like this:
ID | Date | Tested |
1 | Jan 1, 2005 | 1 |
2 | Jan 1, 2005 | 1 |
2 | Feb 1, 2005 | 1 |
3 | March 2, 2005 | 1 |
4 | April 10, 2006 | 1 |
Thank you for any help or consideration,
LAG is not the way to go for this problem, because - among other reasons - it won't tell you whether the upcoming observation is more than seven days in the future. And you need to know that in order to determine whether you have exhausted your seven-day period. In particular, you need
You'll have to test this below, in the absence of sample data in the form of a working data step:
data want (drop=_: nxt_:);
set have (keep=id);
by id;
merge have have (firstobs=2 keep=date test rename=(date=nxt_date test=nxt_test));
retain _temp_test _temp_date ;
if missing(_temp_test) then _temp_test=test;
if missing(_temp_date) then _temp_date=date;
if last.id=0 and nxt_date<=_temp_date+7 then _temp_test=max(_temp_test,nxt_test);
if last.id=1 or nxt_date>_temp_date+7;
date=_temp_date;
test=_temp_test;
output;
call missing(of _temp:);
run;
This assumes the data are sorted by ID/DATE.
The _temp... variables are the prospective final values as you scan for data within a week. Notice this program looks for dates within a week of the FIRST date of the period. So if you have observations on the 1st, 4th, and 8th, only the 1st and 4th will be considered. The 8th would start a new "week". If, on the other hand you want to consolidate all observations until there is a gap of over a week, then change
if last.id=0 and nxt_date<=_temp_date+7 then _temp_test=max(_temp_test,nxt_test);
if last.id=1 or nxt_date>_temp_date+7;
to
if last.id=0 and nxt_date<=date+7 then _temp_test=max(_temp_test,nxt_test);
if last.id=1 or nxt_date>date+7;
The SET statement reads only the ID variable. In tandem with the BY statement, it provides accurate indication of the beginning of an ID (first.id=1) and the end of an id (last.id=1).
The MERGE statement merges each obs with the next obs, providing a way to look ahead at the upcoming date, and if necessary, the upcoming test.
Are we assuming a week is 7 consecutive days and not one of 52 periods or some number of Sun-Sat periods in a given year or leap year?
What do you want done with cases where an individual has for example a Jan 1, a Jan 5, and a Jan 10 test date? Jan 5 is within one "week" of Jan 1, Jan 10 is within one "week" of Jan 5, but Jan 10 is not within one week of Jan 1.
LAG is not the way to go for this problem, because - among other reasons - it won't tell you whether the upcoming observation is more than seven days in the future. And you need to know that in order to determine whether you have exhausted your seven-day period. In particular, you need
You'll have to test this below, in the absence of sample data in the form of a working data step:
data want (drop=_: nxt_:);
set have (keep=id);
by id;
merge have have (firstobs=2 keep=date test rename=(date=nxt_date test=nxt_test));
retain _temp_test _temp_date ;
if missing(_temp_test) then _temp_test=test;
if missing(_temp_date) then _temp_date=date;
if last.id=0 and nxt_date<=_temp_date+7 then _temp_test=max(_temp_test,nxt_test);
if last.id=1 or nxt_date>_temp_date+7;
date=_temp_date;
test=_temp_test;
output;
call missing(of _temp:);
run;
This assumes the data are sorted by ID/DATE.
The _temp... variables are the prospective final values as you scan for data within a week. Notice this program looks for dates within a week of the FIRST date of the period. So if you have observations on the 1st, 4th, and 8th, only the 1st and 4th will be considered. The 8th would start a new "week". If, on the other hand you want to consolidate all observations until there is a gap of over a week, then change
if last.id=0 and nxt_date<=_temp_date+7 then _temp_test=max(_temp_test,nxt_test);
if last.id=1 or nxt_date>_temp_date+7;
to
if last.id=0 and nxt_date<=date+7 then _temp_test=max(_temp_test,nxt_test);
if last.id=1 or nxt_date>date+7;
The SET statement reads only the ID variable. In tandem with the BY statement, it provides accurate indication of the beginning of an ID (first.id=1) and the end of an id (last.id=1).
The MERGE statement merges each obs with the next obs, providing a way to look ahead at the upcoming date, and if necessary, the upcoming test.
Not sure if you want 7-days-intervals or weeks. If you actually want weeks, try:
proc summary data=have nway;
class Id Date;
var Test;
format Date weekv5.;
output out= want(drop= _:) max=;
run;
proc datasets library= work nolist;
modify want;
format Date date9.;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.