SAS Data Management

sjarvis847 · Posted 01-30-2019 02:56 PM

Hello everyone,

I recently received data from the CDC's national death index. They provide death data for subjects and if duplicate records are found they provide back all of the data found and let you feed through it. There is one variable they return called exact match; however if there is an exact match and there are multiple records pulled up, they still send all the matching files. I would like to keep the exact match if found and delete any subsequent records for the same patient, without deleting duplicates for remaining records without exact matches. Below is an example of some data (not the real data of course). See patient ID #1 has an exact match, so I would want to keep that match and delete the following two. Patient ID #2 does not have an exact match, so I have to look at more information to see what row I want to keep, therefore I do not want to delete any records for patient ID #2.

ID DOB State DeathDate ExactMatch

1 9/15/2010 CO 10/26/2016 *

1 9/15/2010 IL 10/28/2016

1 9/15/2010 MO 1/6/2016

2 10/2/2013 KY 3/16/2016

2 10/2/2013 NY 2/26/2016

2 10/2/2013 CO 11/46/2016

Thanks for any help you can provide!

Stephanie

novinosrin · Posted 01-30-2019 04:22 PM

Assuming i understand your req

data have;
infile cards truncover;
input ID   DOB  :mmddyy10.            State  $ DeathDate   :mmddyy10.       ExactMatch $; 
format dob DeathDate mmddyy10.;
cards;
1     9/15/2010      CO     10/26/2016         *
1     9/15/2010      IL        10/28/2016
1     9/15/2010      MO     1/6/2016
2     10/2/2013      KY       3/16/2016           
2     10/2/2013      NY       2/26/2016          
2     10/2/2013      CO      11/26/2016
;

/*assuming your datasets are sorted by id as shown*/

data want;
merge have have(in=b where=(ExactMatch='*'));
by id;
if b then do;
if ExactMatch='*' then output;
end;
else output;
run;

mkeintz · Posted 02-03-2019 11:21 AM

@novinosrin. The merge of a sorted dataset with a specific subset of same is a very useful approach. But don't overlook the additional power of a well-constructed subetting if statement.

Consider replacing

if b then do;
if ExactMatch='*' then output;
end;
else output;

with:

if b=0 or exactmatch='*';

Similarly for the hash offering for non-sorted data. You can replace

rc= h.check();
if rc=0 then do;
if ExactMatch='*' then output;
end;
else output;
drop rc;

with

if h.check()^=0 or exactmatch='*';

I knew study of set unions, intersections, and complements would be useful one day.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 02-03-2019 12:02 PM

@mkeintz Thank you mr elegance & diligence combined as always.

novinosrin · Posted 01-30-2019 04:40 PM

Assuming i understand your req



data have;
infile cards truncover;
input ID   DOB  :mmddyy10.            State  $ DeathDate   :mmddyy10.       ExactMatch $; 
format dob DeathDate mmddyy10.;
cards;
1     9/15/2010      CO     10/26/2016         *
1     9/15/2010      IL        10/28/2016
1     9/15/2010      MO     1/6/2016
2     10/2/2013      KY       3/16/2016           
2     10/2/2013      NY       2/26/2016          
2     10/2/2013      CO      11/26/2016
;
/*sorting doesn't matter*/
data want;
if _n_=1 then do;
  declare hash H (dataset:'have(where=(ExactMatch="*"))') ;
   h.definekey  ('id') ;
   h.definedata ('id') ;
    h.definedone () ;
end;
set have;
rc= h.check();
if rc=0 then do;
if ExactMatch='*' then output;
end;
else output;
drop rc;
run;

SAS Data Management

deleting duplicate subject records by a variable

Re: deleting duplicate subject records by a variable

Re: deleting duplicate subject records by a variable

Re: deleting duplicate subject records by a variable

Re: deleting duplicate subject records by a variable

Follow Us

What is...

SAS Data Management

deleting duplicate subject records by a variable

Re: deleting duplicate subject records by a variable

Re: deleting duplicate subject records by a variable

Re: deleting duplicate subject records by a variable

Re: deleting duplicate subject records by a variable

Register Today!

Follow Us

What is...