Solved: Re: Delete rows where same person has same test and test results in ne...

Barkat · Posted 12-28-2022 04:15 PM

I have the data set have. How can I delete the rows where same ID has same Test_Name and same Test_Result within next 7 days of previous Test_Result date (example, ID=001).

data have;
input id $ Test_name $ Test_Result $ date :mmddyy10.;
format date mmddyy10.;
cards;
001 AB Pos 1/5/2021
002 CD Neg 1/6/2021
003 CD Pos 1/7/2021
004 AB Neg 1/12/2021
005 CD Pos 1/21/2021
006 CD Neg 1/23/2021
007 CD Neg 1/5/2021
001 AB Pos 1/9/2021
008 AB Pos 1/6/2021
009 AB Neg 1/7/2021
010 CD Pos 1/12/2021
011 CD Neg 1/21/2021
012 CD Neg 1/23/2021
013 AB Pos 2/5/2021
014 CD Neg 2/6/2021
003 CD Pos 2/7/2021
015 AB Neg 2/12/2021
016 CD Pos 2/21/2021
006 CD Neg 2/23/2021
017 CD Neg 2/5/2021
018 AB Pos 2/9/2021
008 AB Pos 2/6/2021
019 AB Neg 2/7/2021
020 CD Pos 2/14/2021
021 CD Neg 2/25/2021
022 CD Neg 2/27/2021
;
run;

mkeintz · Posted 12-28-2022 11:21 PM

Which record do you want to delete when you have a match within 7 days?  The latter or the earlier?
 
If you want to delete the latter, and you don't mind generating a resulting dataset sorted by ID/DATE, then:
 
data have;
input id $ Test_name $ Test_Result $ date :mmddyy10.;
format date mmddyy10.;
cards;
001 AB Pos 1/5/2021
002 CD Neg 1/6/2021
003 CD Pos 1/7/2021
004 AB Neg 1/12/2021
005 CD Pos 1/21/2021
006 CD Neg 1/23/2021
007 CD Neg 1/5/2021
001 AB Pos 1/9/2021
008 AB Pos 1/6/2021
009 AB Neg 1/7/2021
010 CD Pos 1/12/2021
011 CD Neg 1/21/2021
012 CD Neg 1/23/2021
013 AB Pos 2/5/2021
014 CD Neg 2/6/2021
003 CD Pos 2/7/2021
015 AB Neg 2/12/2021
016 CD Pos 2/21/2021
006 CD Neg 2/23/2021
017 CD Neg 2/5/2021
018 AB Pos 2/9/2021
008 AB Pos 2/6/2021
019 AB Neg 2/7/2021
020 CD Pos 2/14/2021
021 CD Neg 2/25/2021
022 CD Neg 2/27/2021
run;

proc sort data=have out=need;
by id date;
run;

data want;
set need;
by id;
if first.id=0 and dif(date)<=7 then delete;
run;
If you want to preserve the original data order, you can use a hash object to record the "_most_recent_date" keyed on ID/TEST_NAME/TEST_RESULT, to facilitate comparison to the current date:
 
data want (drop=_:);
set have;
_most_recent_date=.;

if _n_=1 then do;
declare hash h ();
h.definekey('id','test_name','test_result');
h.definedata('id','test_name','test_result','_most_recent_date');
h.definedone();
end;

if h.find()=0 and _most_recent_date>=date-7 then delete;
h.replace(key:id,key:test_name,key:test_result,data:id,data:test_name,data:test_result,data:date);
run;
 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 12-28-2022 11:21 PM

Which record do you want to delete when you have a match within 7 days?  The latter or the earlier?
 
If you want to delete the latter, and you don't mind generating a resulting dataset sorted by ID/DATE, then:
 
data have;
input id $ Test_name $ Test_Result $ date :mmddyy10.;
format date mmddyy10.;
cards;
001 AB Pos 1/5/2021
002 CD Neg 1/6/2021
003 CD Pos 1/7/2021
004 AB Neg 1/12/2021
005 CD Pos 1/21/2021
006 CD Neg 1/23/2021
007 CD Neg 1/5/2021
001 AB Pos 1/9/2021
008 AB Pos 1/6/2021
009 AB Neg 1/7/2021
010 CD Pos 1/12/2021
011 CD Neg 1/21/2021
012 CD Neg 1/23/2021
013 AB Pos 2/5/2021
014 CD Neg 2/6/2021
003 CD Pos 2/7/2021
015 AB Neg 2/12/2021
016 CD Pos 2/21/2021
006 CD Neg 2/23/2021
017 CD Neg 2/5/2021
018 AB Pos 2/9/2021
008 AB Pos 2/6/2021
019 AB Neg 2/7/2021
020 CD Pos 2/14/2021
021 CD Neg 2/25/2021
022 CD Neg 2/27/2021
run;

proc sort data=have out=need;
by id date;
run;

data want;
set need;
by id;
if first.id=0 and dif(date)<=7 then delete;
run;
If you want to preserve the original data order, you can use a hash object to record the "_most_recent_date" keyed on ID/TEST_NAME/TEST_RESULT, to facilitate comparison to the current date:
 
data want (drop=_:);
set have;
_most_recent_date=.;

if _n_=1 then do;
declare hash h ();
h.definekey('id','test_name','test_result');
h.definedata('id','test_name','test_result','_most_recent_date');
h.definedone();
end;

if h.find()=0 and _most_recent_date>=date-7 then delete;
h.replace(key:id,key:test_name,key:test_result,data:id,data:test_name,data:test_result,data:date);
run;
 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Barkat · Posted 12-29-2022 12:07 AM

Thanks for providing solutions in both way. This is super helpful for me.

Delete rows where same person has same test and test results in next 7 days

Re: Delete rows where same person has same test and test results in next 7 days

Re: Delete rows where same person has same test and test results in next 7 days

Re: Delete rows where same person has same test and test results in next 7 days

Delete rows where same person has same test and test results in next 7 days

Re: Delete rows where same person has same test and test results in next 7 days

Re: Delete rows where same person has same test and test results in next 7 days

Re: Delete rows where same person has same test and test results in next 7 days

Registration is open