I have the following data:
data have;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-2 1/1/2021
1 A-1 1/4/2021
1 A-1 1/20/2021
2 A-1 1/1/2021
2 A-1 1/4/2021
3 A-1 1/1/2021
3 A-2 1/4/2021
4 A-2 1/1/2021
4 A-3 1/4/2021
4 A-1 1/5/2021
5 A-2 2/1/2021
;
I would like to keep only all observations with 10 or more days difference by ID, but if within 10-days then type should be with smallest value's row (see id=1 and type=A-1; also id=4 and type=A-1) even the date was higher than the previous date, see wanted data:
ID | TYPE | DATE |
1 | A-1 | 1/4/2021 |
1 | A-1 | 1/20/2021 |
2 | A-1 | 1/1/2021 |
3 | A-1 | 1/1/2021 |
4 | A-1 | 1/5/2021 |
5 | A-2 | 2/1/2021 |
How can I accomplish this? Thank you.
I have two approaches, one using two outputs, the other a look-ahead:
data want;
set have;
by id date;
retain _date _type;
if first.id
then do;
_type = type;
_date = date;
end;
_type = ifc(type < _type,type,_type);
if date - _date gt 10
then do;
_t = type;
_d = date;
type = _type;
date = _date;
output;
_type = _t;
date = _d;
end;
if last.id
then do;
type = _type;
output;
end;
_date = date;
drop _:;
run;
data want2;
set have;
by id;
set
have (
firstobs=2
keep=id date
rename=(
id = _id
date = _date
)
)
have (
obs=1
keep=id date
rename=(
id = _id
date = _date
)
)
;
retain _start _type;
if first.id
then do;
_start = date;
_type = type;
end;
_type = ifc(type < _type,type,_type);
if _id ne id or _date gt _start + 10
then do;
type = _type;
output;
end;
drop _:;
run;
The code does not work for below data (when the same type and within 10 days it shifts the date, but I do not want to shift any date):
data have;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-2 1/1/2021
1 A-1 1/4/2021
1 A-1 1/20/2021
1 A-1 1/23/2021
1 A-1 1/25/2021
1 A-0 1/25/2021
2 A-1 1/1/2021
2 A-1 1/4/2021
3 A-1 1/1/2021
3 A-2 1/4/2021
4 A-2 1/1/2021
4 A-3 1/4/2021
4 A-1 1/5/2021
5 A-2 2/1/2021
6 0-B 07/22/2020
6 1-B 08/05/2021
6 1-B 08/11/2021
6 1-B 08/18/2021
6 1-B 01/26/2022
;
data wanted;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-1 1/4/2021
1 A-0 1/25/2021
2 A-1 1/1/2021
3 A-1 1/1/2021
4 A-1 1/5/2021
5 A-2 2/1/2021
6 0-B 07/22/2020
6 1-B 08/05/2021
6 1-B 01/26/2022
;
Your WANT data is inconsistent.
For the first "window" for ID 1 and 4, you take the last date, but for the windows of ID 2 and 3 you take the first.
Please restate your requirements in a complete and consistent way for all variables in the WANT dataset.
Assuming the data are sorted by ID/DATE, I believe your criterion can be restated as "keep the first record for each ID, and all other records that are more than 10 days after their predecessor".
If so, then it's easy:
data have;
input id type $ date :mmddyy10.;
format type $3. date mmddyy10.;
cards;
1 A-2 1/1/2021
1 A-1 1/4/2021
1 A-1 1/20/2021
2 A-1 1/1/2021
2 A-1 1/4/2021
3 A-1 1/1/2021
3 A-2 1/4/2021
4 A-2 1/1/2021
4 A-3 1/4/2021
4 A-1 1/5/2021
5 A-2 2/1/2021
run;
data want;
set have ;
by id;
if first.id=1 or date-10>lag(date);
run;
Simply telling me "it does not work" ... does not work. Please explain what you got that was unexpected or undesired/ And show the log of the failing program.
Also earlier you said
@Emma2021 wrote:
Thank you! I should sort by id type and date and I think this should work!
Why are you sorting by id type and date. You initial posting asked only for 10-day separation within ID, implying no role for type, and therefore no apparent need for type to be a sort key. Are you now saying you want 10-day separation within each ID/TYPE? Is there some other role of the variable type that has not yet been explained (such as choosing between two types within an ID if they have tied dates)?
Then the task is to read through a time span - i.e. all obs until there is a trailing 10day+ gap. Over the course of that span determine what is the lowest type. Then re-read the same obs, and output the first one that has type equal to lowest type.
data want (drop=_: );
/* Read and count obs until a ten-day gap follows, determine lowest type */
_lowest_type=' ';
do _i=1 by 1 until (last.id=1 or date+10<_nxt_date);
set have (keep=id);
by id;
merge have have (firstobs=2 keep=date rename=(date=_nxt_date));
if _lowest_type=' ' then _lowest_type=type;
else if type<_lowest_type then _lowest_type=type;
end;
/* Reread obs and output first instance with lowest type */
do _i=_i to 1 by -1;
set have;
if type=_lowest_type then do;
output;
call missing(_lowest_type); /*Prevent multiple hits */
end;
end;
run;
Note the data need to be in chronological order, (sort by ID/DATE). You don't need to include TYPE in the sort keys.
I'm still baffled. Do you want to calculate over a window of 10 days, or keep calculating until there is a 10-day span between dates?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.