- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
I recently received data from the CDC's national death index. They provide death data for subjects and if duplicate records are found they provide back all of the data found and let you feed through it. There is one variable they return called exact match; however if there is an exact match and there are multiple records pulled up, they still send all the matching files. I would like to keep the exact match if found and delete any subsequent records for the same patient, without deleting duplicates for remaining records without exact matches. Below is an example of some data (not the real data of course). See patient ID #1 has an exact match, so I would want to keep that match and delete the following two. Patient ID #2 does not have an exact match, so I have to look at more information to see what row I want to keep, therefore I do not want to delete any records for patient ID #2.
ID DOB State DeathDate ExactMatch
1 9/15/2010 CO 10/26/2016 *
1 9/15/2010 IL 10/28/2016
1 9/15/2010 MO 1/6/2016
2 10/2/2013 KY 3/16/2016
2 10/2/2013 NY 2/26/2016
2 10/2/2013 CO 11/46/2016
Thanks for any help you can provide!
Stephanie
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming i understand your req
data have;
infile cards truncover;
input ID DOB :mmddyy10. State $ DeathDate :mmddyy10. ExactMatch $;
format dob DeathDate mmddyy10.;
cards;
1 9/15/2010 CO 10/26/2016 *
1 9/15/2010 IL 10/28/2016
1 9/15/2010 MO 1/6/2016
2 10/2/2013 KY 3/16/2016
2 10/2/2013 NY 2/26/2016
2 10/2/2013 CO 11/26/2016
;
/*assuming your datasets are sorted by id as shown*/
data want;
merge have have(in=b where=(ExactMatch='*'));
by id;
if b then do;
if ExactMatch='*' then output;
end;
else output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@novinosrin. The merge of a sorted dataset with a specific subset of same is a very useful approach. But don't overlook the additional power of a well-constructed subetting if statement.
Consider replacing
if b then do;
if ExactMatch='*' then output;
end;
else output;
with:
if b=0 or exactmatch='*';
Similarly for the hash offering for non-sorted data. You can replace
rc= h.check();
if rc=0 then do;
if ExactMatch='*' then output;
end;
else output;
drop rc;
with
if h.check()^=0 or exactmatch='*';
I knew study of set unions, intersections, and complements would be useful one day.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@mkeintz Thank you mr elegance & diligence combined as always.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Assuming i understand your req
data have;
infile cards truncover;
input ID DOB :mmddyy10. State $ DeathDate :mmddyy10. ExactMatch $;
format dob DeathDate mmddyy10.;
cards;
1 9/15/2010 CO 10/26/2016 *
1 9/15/2010 IL 10/28/2016
1 9/15/2010 MO 1/6/2016
2 10/2/2013 KY 3/16/2016
2 10/2/2013 NY 2/26/2016
2 10/2/2013 CO 11/26/2016
;
/*sorting doesn't matter*/
data want;
if _n_=1 then do;
declare hash H (dataset:'have(where=(ExactMatch="*"))') ;
h.definekey ('id') ;
h.definedata ('id') ;
h.definedone () ;
end;
set have;
rc= h.check();
if rc=0 then do;
if ExactMatch='*' then output;
end;
else output;
drop rc;
run;