BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Barkat
Pyrite | Level 9

I have the data set have. How can I delete the rows where same ID has same Test_Name and same Test_Result within next 7 days of previous Test_Result date (example, ID=001). 

data have;
input id $ Test_name $ Test_Result $ date :mmddyy10.;
format date mmddyy10.;
cards;
001 AB Pos 1/5/2021
002 CD Neg 1/6/2021
003 CD Pos 1/7/2021
004 AB Neg 1/12/2021
005 CD Pos 1/21/2021
006 CD Neg 1/23/2021
007 CD Neg 1/5/2021
001 AB Pos 1/9/2021
008 AB Pos 1/6/2021
009 AB Neg 1/7/2021
010 CD Pos 1/12/2021
011 CD Neg 1/21/2021
012 CD Neg 1/23/2021
013 AB Pos 2/5/2021
014 CD Neg 2/6/2021
003 CD Pos 2/7/2021
015 AB Neg 2/12/2021
016 CD Pos 2/21/2021
006 CD Neg 2/23/2021
017 CD Neg 2/5/2021
018 AB Pos 2/9/2021
008 AB Pos 2/6/2021
019 AB Neg 2/7/2021
020 CD Pos 2/14/2021
021 CD Neg 2/25/2021
022 CD Neg 2/27/2021
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star
Which record do you want to delete when you have a match within 7 days?  The latter or the earlier?
 
If you want to delete the latter, and you don't mind generating a resulting dataset sorted by ID/DATE, then:
 
data have;
input id $ Test_name $ Test_Result $ date :mmddyy10.;
format date mmddyy10.;
cards;
001 AB Pos 1/5/2021
002 CD Neg 1/6/2021
003 CD Pos 1/7/2021
004 AB Neg 1/12/2021
005 CD Pos 1/21/2021
006 CD Neg 1/23/2021
007 CD Neg 1/5/2021
001 AB Pos 1/9/2021
008 AB Pos 1/6/2021
009 AB Neg 1/7/2021
010 CD Pos 1/12/2021
011 CD Neg 1/21/2021
012 CD Neg 1/23/2021
013 AB Pos 2/5/2021
014 CD Neg 2/6/2021
003 CD Pos 2/7/2021
015 AB Neg 2/12/2021
016 CD Pos 2/21/2021
006 CD Neg 2/23/2021
017 CD Neg 2/5/2021
018 AB Pos 2/9/2021
008 AB Pos 2/6/2021
019 AB Neg 2/7/2021
020 CD Pos 2/14/2021
021 CD Neg 2/25/2021
022 CD Neg 2/27/2021
run;

proc sort data=have out=need;
by id date;
run;

data want;
set need;
by id;
if first.id=0 and dif(date)<=7 then delete;
run;
If you want to preserve the original data order, you can use a hash object to record the "_most_recent_date" keyed on ID/TEST_NAME/TEST_RESULT, to facilitate comparison to the current date:
 
data want (drop=_:);
set have;
_most_recent_date=.;

if _n_=1 then do;
declare hash h ();
h.definekey('id','test_name','test_result');
h.definedata('id','test_name','test_result','_most_recent_date');
h.definedone();
end;

if h.find()=0 and _most_recent_date>=date-7 then delete;
h.replace(key:id,key:test_name,key:test_result,data:id,data:test_name,data:test_result,data:date);
run;
 
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star
Which record do you want to delete when you have a match within 7 days?  The latter or the earlier?
 
If you want to delete the latter, and you don't mind generating a resulting dataset sorted by ID/DATE, then:
 
data have;
input id $ Test_name $ Test_Result $ date :mmddyy10.;
format date mmddyy10.;
cards;
001 AB Pos 1/5/2021
002 CD Neg 1/6/2021
003 CD Pos 1/7/2021
004 AB Neg 1/12/2021
005 CD Pos 1/21/2021
006 CD Neg 1/23/2021
007 CD Neg 1/5/2021
001 AB Pos 1/9/2021
008 AB Pos 1/6/2021
009 AB Neg 1/7/2021
010 CD Pos 1/12/2021
011 CD Neg 1/21/2021
012 CD Neg 1/23/2021
013 AB Pos 2/5/2021
014 CD Neg 2/6/2021
003 CD Pos 2/7/2021
015 AB Neg 2/12/2021
016 CD Pos 2/21/2021
006 CD Neg 2/23/2021
017 CD Neg 2/5/2021
018 AB Pos 2/9/2021
008 AB Pos 2/6/2021
019 AB Neg 2/7/2021
020 CD Pos 2/14/2021
021 CD Neg 2/25/2021
022 CD Neg 2/27/2021
run;

proc sort data=have out=need;
by id date;
run;

data want;
set need;
by id;
if first.id=0 and dif(date)<=7 then delete;
run;
If you want to preserve the original data order, you can use a hash object to record the "_most_recent_date" keyed on ID/TEST_NAME/TEST_RESULT, to facilitate comparison to the current date:
 
data want (drop=_:);
set have;
_most_recent_date=.;

if _n_=1 then do;
declare hash h ();
h.definekey('id','test_name','test_result');
h.definedata('id','test_name','test_result','_most_recent_date');
h.definedone();
end;

if h.find()=0 and _most_recent_date>=date-7 then delete;
h.replace(key:id,key:test_name,key:test_result,data:id,data:test_name,data:test_result,data:date);
run;
 
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Barkat
Pyrite | Level 9
Thanks for providing solutions in both way. This is super helpful for me.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 728 views
  • 0 likes
  • 2 in conversation