Hello Everyone,
I have two datasets. One dataset is the subset of another dataset. I want to match both of the IDs . I want to find IDs from the subset dataset that wont match with the master dataset. For example:
Master dataset:
| ID |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
Subset dataset
| ID |
| 2 |
| 4 |
| 5 |
| 9 |
When we match both datasets, My result should look like this:
| 9 |
Use the EXCEPT operator in SQL:
proc sql;
select id from Subset
except
select id from Master;
quit;
or the NOT IN operator
proc sql;
select id from Subset
where id not in (select id from Master);
quit;
data a;
input ID;
datalines;
1
2
3
4
5
;
data b;
input ID;
datalines;
2
4
5
9
;
proc sql;
create table want as
select ID
from b
where ID not in (select ID from a);
quit;
Use the EXCEPT operator in SQL:
proc sql;
select id from Subset
except
select id from Master;
quit;
or the NOT IN operator
proc sql;
select id from Subset
where id not in (select id from Master);
quit;
A data step will be the fastest since the data is sorted.
data WANT;
merge MASTER(in=M) SUBSET(in=S);
by ID;
if S and not M;
run;
if you have duplicates and want to avoid them, the if clause becomes :
if first.ID and S and not M;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.