I have duplicates like CT001 and ct001 under "Pepid" but SAS does not recognize them as duplicates, how do I remove one and keep the most recent of the two?
Pepid | patient_id | Sex | LastPickupDate |
CT08662 | 8480 | Male | 05-Apr-22 |
ct08662 | 8480 | Male | 11-Jan-22 |
IHSD/CT00071 | 4883 | Male | 27-Apr-22 |
IHSD/CT00073 | 4892 | Male | 10-May-22 |
IHSD/CT00079 | 4859 | Male | 27-Apr-22 |
ihsd/ct00071 | 4883 | Male | 26-Jan-22 |
ihsd/ct00073 | 4892 | Male | 09-Feb-22 |
ihsd/ct00079 | 4859 | Male | 26-Jan-22 |
Like this?
data have;
infile datalines truncover dlm='|' dsd;
input Pepid :$20. patient_id:best32. Sex:$10. LastPickupDate :date9.;
format LastPickupDate date9.;
datalines;
CT08662|8480|Male|05-Apr-22
ct08662|8480|Male|11-Jan-22
IHSD/CT00071|4883|Male|27-Apr-22
IHSD/CT00073|4892|Male|10-May-22
IHSD/CT00079|4859|Male|27-Apr-22
ihsd/ct00071|4883|Male|26-Jan-22
ihsd/ct00073|4892|Male|09-Feb-22
ihsd/ct00079|4859|Male|26-Jan-22
;
proc sql;
create table want as
select *
from have
group by patient_id, sex, upcase(Pepid)
having max(LastPickupDate)=LastPickupDate
;
quit;
Why don't you use patient_id as your duplicate finder?
That would have worked fine, however, the data set has some facilities mapped under another (a hub and spoke approach) and running an independent database but having the same facility ID, therefore, using that erroneously deletes the records of patients from the 'spoke' see below: 'VHCT00001' from 'spoke' facility has the same Patient_id with TB143 from the 'hub' facility
FacilityCode | Pepid_fac | patient_id | Pepid | Sex | LastPickupDate | AgeAtStartofART |
i6Rcojd7d9S | TB143-i6Rcojd7d9S | 2 | TB143 | Male | 12-Apr-22 | 40 |
i6Rcojd7d9S | VHCT00001-i6Rcojd7d9S | 2 | VHCT00001 | Female | 30-Jun-22 | 25 |
i6Rcojd7d9S | OP00002-i6Rcojd7d9S | 3 | OP00002 | Female | 22-Feb-18 | 3 |
i6Rcojd7d9S | PI1825-i6Rcojd7d9S | 3 | PI1825 | Female | 13-May-22 | 38 |
i6Rcojd7d9S | PI0964-i6Rcojd7d9S | 4 | PI0964 | Female | 03-Feb-22 | 28 |
i6Rcojd7d9S | VHCT00002-i6Rcojd7d9S | 4 | VHCT00002 | Male | 30-May-22 | 31 |
i6Rcojd7d9S | CT138-i6Rcojd7d9S | 5 | CT138 | Female | 06-Apr-22 | 48 |
i6Rcojd7d9S | VHCT00003B-i6Rcojd7d9S | 5 | VHCT00003B | Female | 20-Jun-22 | 31 |
i6Rcojd7d9S | CT06237-i6Rcojd7d9S | 6 | CT06237 | Female | 11-Feb-22 | 52 |
i6Rcojd7d9S | VHOPI00009B-i6Rcojd7d9S | 6 | VHOPI00009B | Female | 08-Aug-19 | 10 |
i6Rcojd7d9S | CT04218-i6Rcojd7d9S | 7 | CT04218 | Male | 25-Jan-22 | 64 |
i6Rcojd7d9S | VHOPI00004B-i6Rcojd7d9S | 7 | VHOPI00004B | Female | 11-Apr-22 | 3 |
i6Rcojd7d9S | CT144-i6Rcojd7d9S | 8 | CT144 | Female | 11-Feb-19 | 32 |
i6Rcojd7d9S | VHOPI00005-i6Rcojd7d9S | 8 | VHOPI00005 | Male | 13-Jan-18 | 2 |
i6Rcojd7d9S | CT00272-i6Rcojd7d9S | 9 | CT00272 | Female | 25-Mar-22 | 39 |
i6Rcojd7d9S | VHOP00007-i6Rcojd7d9S | 9 | VHOP00007 | Female | 28-Jun-22 | 7 |
i6Rcojd7d9S | PM01485-i6Rcojd7d9S | 10 | PM01485 | Male | 04-Feb-22 | 40 |
i6Rcojd7d9S | VHCT00009A-i6Rcojd7d9S | 10 | VHCT00009A | Female | 05-May-22 | 37 |
Like this?
data have;
infile datalines truncover dlm='|' dsd;
input Pepid :$20. patient_id:best32. Sex:$10. LastPickupDate :date9.;
format LastPickupDate date9.;
datalines;
CT08662|8480|Male|05-Apr-22
ct08662|8480|Male|11-Jan-22
IHSD/CT00071|4883|Male|27-Apr-22
IHSD/CT00073|4892|Male|10-May-22
IHSD/CT00079|4859|Male|27-Apr-22
ihsd/ct00071|4883|Male|26-Jan-22
ihsd/ct00073|4892|Male|09-Feb-22
ihsd/ct00079|4859|Male|26-Jan-22
;
proc sql;
create table want as
select *
from have
group by patient_id, sex, upcase(Pepid)
having max(LastPickupDate)=LastPickupDate
;
quit;
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.