topic Re: How to remove all the duplicated ids from one dataset according to unique ids in another data in SAS Programming

How to remove all the duplicated ids from one dataset according to unique ids in another data

SeaMoon_168 — Tue, 21 Jan 2025 00:01:58 GMT

Here are two simple sample datasets. Data 1 has duplicated IDs A B and C.

id	weight
A	30
A	36
B	21
B	25
B	30
C	40
C	41
C	43
C	40

Data 2 has unique id

I want to remove all duplicated ids (A) in Data 1 based on the unique ids in Data 2 (A). So the result is

id	weight
B	21
B	25
B	30
C	40
C	41
C	43
C	40

This is a very simple example. However, my Data 1 may have thousands of duplicated IDs and Data 2 may have hundreds of unique IDs. Many thanks!

Re: How to remove all the duplicated ids from one dataset according to unique ids in another data

Tom — Tue, 21 Jan 2025 00:12:17 GMT

Not sure how the duplicates play into this. Sounds like you just want the observations from the first dataset where the ID is NOT in the second dataset.

If they are sorted it is a simple merge.

data want;
  merge data1 data2(in=in2);
  by id;
  if not in2;
run;

If not then it might be simpler to use PROC SQL (since it will sort for you).

proc sql;
create table want as 
select * from data1
where id not in (select id from data2)
;
quit;

Re: How to remove all the duplicated ids from one dataset according to unique ids in another data

mkeintz — Tue, 21 Jan 2025 17:58:34 GMT

I dataset1 is not sorted, and you want to preserve the original order of dataset1, then you can code a single-step solution by storing dataset2 in a hash object.

data want;
  set dataset1;
  if _n_=1 then do;
    declare hash h (dataset:'dataset2');
      h.definekey('id');
      h.definedone();
  end;
  if h.check()^=0 then output;  /*If not found in the hash object*/
run;