BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ken_nuga
Fluorite | Level 6

I have duplicates like CT001 and ct001 under "Pepid" but SAS does not recognize them as duplicates, how do I remove one and keep the most recent of the two?

Pepidpatient_idSexLastPickupDate
CT086628480Male05-Apr-22
ct086628480Male11-Jan-22
IHSD/CT000714883Male27-Apr-22
IHSD/CT000734892Male10-May-22
IHSD/CT000794859Male27-Apr-22
ihsd/ct000714883Male26-Jan-22
ihsd/ct000734892Male09-Feb-22
ihsd/ct000794859Male26-Jan-22
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Like this?

data have;
  infile datalines truncover dlm='|' dsd;
  input Pepid :$20. patient_id:best32. Sex:$10. LastPickupDate :date9.;
  format LastPickupDate date9.;
  datalines;
CT08662|8480|Male|05-Apr-22
ct08662|8480|Male|11-Jan-22
IHSD/CT00071|4883|Male|27-Apr-22
IHSD/CT00073|4892|Male|10-May-22
IHSD/CT00079|4859|Male|27-Apr-22
ihsd/ct00071|4883|Male|26-Jan-22
ihsd/ct00073|4892|Male|09-Feb-22
ihsd/ct00079|4859|Male|26-Jan-22
;

proc sql;
  create table want as
  select *
  from have
  group by patient_id, sex, upcase(Pepid)
  having max(LastPickupDate)=LastPickupDate
  ;
quit;

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

Why don't you use patient_id as your duplicate finder?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ken_nuga
Fluorite | Level 6

That would have worked fine, however, the data set has some facilities mapped under another (a hub and spoke approach) and running an independent database but having the same facility ID, therefore, using that erroneously deletes the records of patients from the 'spoke' see below: 'VHCT00001' from 'spoke' facility has the same Patient_id with TB143 from the 'hub' facility

FacilityCodePepid_facpatient_idPepidSexLastPickupDateAgeAtStartofART
i6Rcojd7d9STB143-i6Rcojd7d9S2TB143Male12-Apr-2240
i6Rcojd7d9SVHCT00001-i6Rcojd7d9S2VHCT00001Female30-Jun-2225
i6Rcojd7d9SOP00002-i6Rcojd7d9S3OP00002Female22-Feb-183
i6Rcojd7d9SPI1825-i6Rcojd7d9S3PI1825Female13-May-2238
i6Rcojd7d9SPI0964-i6Rcojd7d9S4PI0964Female03-Feb-2228
i6Rcojd7d9SVHCT00002-i6Rcojd7d9S4VHCT00002Male30-May-2231
i6Rcojd7d9SCT138-i6Rcojd7d9S5CT138Female06-Apr-2248
i6Rcojd7d9SVHCT00003B-i6Rcojd7d9S5VHCT00003BFemale20-Jun-2231
i6Rcojd7d9SCT06237-i6Rcojd7d9S6CT06237Female11-Feb-2252
i6Rcojd7d9SVHOPI00009B-i6Rcojd7d9S6VHOPI00009BFemale08-Aug-1910
i6Rcojd7d9SCT04218-i6Rcojd7d9S7CT04218Male25-Jan-2264
i6Rcojd7d9SVHOPI00004B-i6Rcojd7d9S7VHOPI00004BFemale11-Apr-223
i6Rcojd7d9SCT144-i6Rcojd7d9S8CT144Female11-Feb-1932
i6Rcojd7d9SVHOPI00005-i6Rcojd7d9S8VHOPI00005Male13-Jan-182
i6Rcojd7d9SCT00272-i6Rcojd7d9S9CT00272Female25-Mar-2239
i6Rcojd7d9SVHOP00007-i6Rcojd7d9S9VHOP00007Female28-Jun-227
i6Rcojd7d9SPM01485-i6Rcojd7d9S10PM01485Male04-Feb-2240
i6Rcojd7d9SVHCT00009A-i6Rcojd7d9S10VHCT00009AFemale05-May-2237
Patrick
Opal | Level 21

Like this?

data have;
  infile datalines truncover dlm='|' dsd;
  input Pepid :$20. patient_id:best32. Sex:$10. LastPickupDate :date9.;
  format LastPickupDate date9.;
  datalines;
CT08662|8480|Male|05-Apr-22
ct08662|8480|Male|11-Jan-22
IHSD/CT00071|4883|Male|27-Apr-22
IHSD/CT00073|4892|Male|10-May-22
IHSD/CT00079|4859|Male|27-Apr-22
ihsd/ct00071|4883|Male|26-Jan-22
ihsd/ct00073|4892|Male|09-Feb-22
ihsd/ct00079|4859|Male|26-Jan-22
;

proc sql;
  create table want as
  select *
  from have
  group by patient_id, sex, upcase(Pepid)
  having max(LastPickupDate)=LastPickupDate
  ;
quit;
Ken_nuga
Fluorite | Level 6
Many thanks, but I had to do as shown below:

DATA have;
set have;
Pepid2=upcase(Pepid);
Pepid_fac2=catx('-',Pepid2,FacilityCode);
run;

proc sql;
create table want1 as
select *
from have
group by Pepid_fac2, patient_id
having max(LastPickupDate)=LastPickupDate
;
quit;

DATA want2;
set want1;
by Pepid_fac2;
if first.Pepid_fac2;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 753 views
  • 1 like
  • 3 in conversation