Hello Everyone,
I have two datasets. One dataset is the subset of another dataset. I want to match both of the IDs . I want to find IDs from the subset dataset that wont match with the master dataset. For example:
Master dataset:
ID |
1 |
2 |
3 |
4 |
5 |
Subset dataset
ID |
2 |
4 |
5 |
9 |
When we match both datasets, My result should look like this:
9 |
Use the EXCEPT operator in SQL:
proc sql;
select id from Subset
except
select id from Master;
quit;
or the NOT IN operator
proc sql;
select id from Subset
where id not in (select id from Master);
quit;
data a;
input ID;
datalines;
1
2
3
4
5
;
data b;
input ID;
datalines;
2
4
5
9
;
proc sql;
create table want as
select ID
from b
where ID not in (select ID from a);
quit;
Use the EXCEPT operator in SQL:
proc sql;
select id from Subset
except
select id from Master;
quit;
or the NOT IN operator
proc sql;
select id from Subset
where id not in (select id from Master);
quit;
A data step will be the fastest since the data is sorted.
data WANT;
merge MASTER(in=M) SUBSET(in=S);
by ID;
if S and not M;
run;
if you have duplicates and want to avoid them, the if clause becomes :
if first.ID and S and not M;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.