BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SeaMoon_168
Quartz | Level 8

Here are two simple sample datasets. Data 1 has duplicated IDs A B and C. 

id weight
A 30
A 36
B 21
B 25
B 30
C 40
C 41
C 43
C 40

Data 2 has unique id

id
A

I want to remove all duplicated ids (A) in Data 1 based on the unique ids in Data 2 (A). So the result is

id weight
B 21
B 25
B 30
C 40
C 41
C 43
C 40

This is a very simple example. However, my Data 1 may have thousands of duplicated IDs and Data 2 may have hundreds of unique IDs. Many thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Not sure how the duplicates play into this.  Sounds like you just want the observations from the first dataset where the ID is NOT in the second dataset.

 

If they are sorted it is a simple merge.

data want;
  merge data1 data2(in=in2);
  by id;
  if not in2;
run;

If not then it might be simpler to use PROC SQL (since it will sort for you).

proc sql;
create table want as 
select * from data1
where id not in (select id from data2)
;
quit;

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

Not sure how the duplicates play into this.  Sounds like you just want the observations from the first dataset where the ID is NOT in the second dataset.

 

If they are sorted it is a simple merge.

data want;
  merge data1 data2(in=in2);
  by id;
  if not in2;
run;

If not then it might be simpler to use PROC SQL (since it will sort for you).

proc sql;
create table want as 
select * from data1
where id not in (select id from data2)
;
quit;
mkeintz
PROC Star

I dataset1 is not sorted, and you want to preserve the original order of dataset1, then you can code a single-step solution by storing dataset2 in a hash object.

 

data want;
  set dataset1;
  if _n_=1 then do;
    declare hash h (dataset:'dataset2');
      h.definekey('id');
      h.definedone();
  end;
  if h.check()^=0 then output;  /*If not found in the hash object*/
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 593 views
  • 2 likes
  • 3 in conversation