I have duplicates like CT001 and ct001 under "Pepid" but SAS does not recognize them as duplicates, how do I remove one and keep the most recent of the two?
Pepid | patient_id | Sex | LastPickupDate |
CT08662 | 8480 | Male | 05-Apr-22 |
ct08662 | 8480 | Male | 11-Jan-22 |
IHSD/CT00071 | 4883 | Male | 27-Apr-22 |
IHSD/CT00073 | 4892 | Male | 10-May-22 |
IHSD/CT00079 | 4859 | Male | 27-Apr-22 |
ihsd/ct00071 | 4883 | Male | 26-Jan-22 |
ihsd/ct00073 | 4892 | Male | 09-Feb-22 |
ihsd/ct00079 | 4859 | Male | 26-Jan-22 |
Like this?
data have;
infile datalines truncover dlm='|' dsd;
input Pepid :$20. patient_id:best32. Sex:$10. LastPickupDate :date9.;
format LastPickupDate date9.;
datalines;
CT08662|8480|Male|05-Apr-22
ct08662|8480|Male|11-Jan-22
IHSD/CT00071|4883|Male|27-Apr-22
IHSD/CT00073|4892|Male|10-May-22
IHSD/CT00079|4859|Male|27-Apr-22
ihsd/ct00071|4883|Male|26-Jan-22
ihsd/ct00073|4892|Male|09-Feb-22
ihsd/ct00079|4859|Male|26-Jan-22
;
proc sql;
create table want as
select *
from have
group by patient_id, sex, upcase(Pepid)
having max(LastPickupDate)=LastPickupDate
;
quit;
Why don't you use patient_id as your duplicate finder?
That would have worked fine, however, the data set has some facilities mapped under another (a hub and spoke approach) and running an independent database but having the same facility ID, therefore, using that erroneously deletes the records of patients from the 'spoke' see below: 'VHCT00001' from 'spoke' facility has the same Patient_id with TB143 from the 'hub' facility
FacilityCode | Pepid_fac | patient_id | Pepid | Sex | LastPickupDate | AgeAtStartofART |
i6Rcojd7d9S | TB143-i6Rcojd7d9S | 2 | TB143 | Male | 12-Apr-22 | 40 |
i6Rcojd7d9S | VHCT00001-i6Rcojd7d9S | 2 | VHCT00001 | Female | 30-Jun-22 | 25 |
i6Rcojd7d9S | OP00002-i6Rcojd7d9S | 3 | OP00002 | Female | 22-Feb-18 | 3 |
i6Rcojd7d9S | PI1825-i6Rcojd7d9S | 3 | PI1825 | Female | 13-May-22 | 38 |
i6Rcojd7d9S | PI0964-i6Rcojd7d9S | 4 | PI0964 | Female | 03-Feb-22 | 28 |
i6Rcojd7d9S | VHCT00002-i6Rcojd7d9S | 4 | VHCT00002 | Male | 30-May-22 | 31 |
i6Rcojd7d9S | CT138-i6Rcojd7d9S | 5 | CT138 | Female | 06-Apr-22 | 48 |
i6Rcojd7d9S | VHCT00003B-i6Rcojd7d9S | 5 | VHCT00003B | Female | 20-Jun-22 | 31 |
i6Rcojd7d9S | CT06237-i6Rcojd7d9S | 6 | CT06237 | Female | 11-Feb-22 | 52 |
i6Rcojd7d9S | VHOPI00009B-i6Rcojd7d9S | 6 | VHOPI00009B | Female | 08-Aug-19 | 10 |
i6Rcojd7d9S | CT04218-i6Rcojd7d9S | 7 | CT04218 | Male | 25-Jan-22 | 64 |
i6Rcojd7d9S | VHOPI00004B-i6Rcojd7d9S | 7 | VHOPI00004B | Female | 11-Apr-22 | 3 |
i6Rcojd7d9S | CT144-i6Rcojd7d9S | 8 | CT144 | Female | 11-Feb-19 | 32 |
i6Rcojd7d9S | VHOPI00005-i6Rcojd7d9S | 8 | VHOPI00005 | Male | 13-Jan-18 | 2 |
i6Rcojd7d9S | CT00272-i6Rcojd7d9S | 9 | CT00272 | Female | 25-Mar-22 | 39 |
i6Rcojd7d9S | VHOP00007-i6Rcojd7d9S | 9 | VHOP00007 | Female | 28-Jun-22 | 7 |
i6Rcojd7d9S | PM01485-i6Rcojd7d9S | 10 | PM01485 | Male | 04-Feb-22 | 40 |
i6Rcojd7d9S | VHCT00009A-i6Rcojd7d9S | 10 | VHCT00009A | Female | 05-May-22 | 37 |
Like this?
data have;
infile datalines truncover dlm='|' dsd;
input Pepid :$20. patient_id:best32. Sex:$10. LastPickupDate :date9.;
format LastPickupDate date9.;
datalines;
CT08662|8480|Male|05-Apr-22
ct08662|8480|Male|11-Jan-22
IHSD/CT00071|4883|Male|27-Apr-22
IHSD/CT00073|4892|Male|10-May-22
IHSD/CT00079|4859|Male|27-Apr-22
ihsd/ct00071|4883|Male|26-Jan-22
ihsd/ct00073|4892|Male|09-Feb-22
ihsd/ct00079|4859|Male|26-Jan-22
;
proc sql;
create table want as
select *
from have
group by patient_id, sex, upcase(Pepid)
having max(LastPickupDate)=LastPickupDate
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.