Hello Everyone,
I have two datasets. One dataset is the subset of another dataset. I want to match both of the IDs . I want to find IDs from the subset dataset that wont match with the master dataset. For example:
Master dataset:
| ID |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
Subset dataset
| ID |
| 2 |
| 4 |
| 5 |
| 9 |
When we match both datasets, My result should look like this:
| 9 |
Use the EXCEPT operator in SQL:
proc sql;
select id from Subset
except
select id from Master;
quit;
or the NOT IN operator
proc sql;
select id from Subset
where id not in (select id from Master);
quit;
data a;
input ID;
datalines;
1
2
3
4
5
;
data b;
input ID;
datalines;
2
4
5
9
;
proc sql;
create table want as
select ID
from b
where ID not in (select ID from a);
quit;
Use the EXCEPT operator in SQL:
proc sql;
select id from Subset
except
select id from Master;
quit;
or the NOT IN operator
proc sql;
select id from Subset
where id not in (select id from Master);
quit;
A data step will be the fastest since the data is sorted.
data WANT;
merge MASTER(in=M) SUBSET(in=S);
by ID;
if S and not M;
run;
if you have duplicates and want to avoid them, the if clause becomes :
if first.ID and S and not M;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.