Here are two simple sample datasets. Data 1 has duplicated IDs A B and C.
id | weight |
A | 30 |
A | 36 |
B | 21 |
B | 25 |
B | 30 |
C | 40 |
C | 41 |
C | 43 |
C | 40 |
Data 2 has unique id
id |
A |
I want to remove all duplicated ids (A) in Data 1 based on the unique ids in Data 2 (A). So the result is
id | weight |
B | 21 |
B | 25 |
B | 30 |
C | 40 |
C | 41 |
C | 43 |
C | 40 |
This is a very simple example. However, my Data 1 may have thousands of duplicated IDs and Data 2 may have hundreds of unique IDs. Many thanks!
Not sure how the duplicates play into this. Sounds like you just want the observations from the first dataset where the ID is NOT in the second dataset.
If they are sorted it is a simple merge.
data want;
merge data1 data2(in=in2);
by id;
if not in2;
run;
If not then it might be simpler to use PROC SQL (since it will sort for you).
proc sql;
create table want as
select * from data1
where id not in (select id from data2)
;
quit;
Not sure how the duplicates play into this. Sounds like you just want the observations from the first dataset where the ID is NOT in the second dataset.
If they are sorted it is a simple merge.
data want;
merge data1 data2(in=in2);
by id;
if not in2;
run;
If not then it might be simpler to use PROC SQL (since it will sort for you).
proc sql;
create table want as
select * from data1
where id not in (select id from data2)
;
quit;
I dataset1 is not sorted, and you want to preserve the original order of dataset1, then you can code a single-step solution by storing dataset2 in a hash object.
data want;
set dataset1;
if _n_=1 then do;
declare hash h (dataset:'dataset2');
h.definekey('id');
h.definedone();
end;
if h.check()^=0 then output; /*If not found in the hash object*/
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.