Hi guys, is there a way to remove rows in a file that are not present in a second file by matching the column ID?
Thank you in advance
For example suppose to have:
data set1
ID AGE Role
001 35-50 A
002 55-60 B
003 35-50 C
004 35-50 D
005 35-50 E
and data set2
ID Hours Vacancies
002 900 21
003 456 54
005 45 0
I would like the data set1 to become:
ID AGE Role
002 55-60 B
003 35-50 C
005 35-50 E
Use a hash:
data want;
set ds1;
if _n_ = 1
then do;
declare hash ds2 (dataset:"ds2");
ds2.definekey("id");
ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;
Untested; for tested code, provide example data in data steps with datalines.
Yes there is. Please show us some data. Makes it much easier to provide a usable code answer.
Use a hash:
data want;
set ds1;
if _n_ = 1
then do;
declare hash ds2 (dataset:"ds2");
ds2.definekey("id");
ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;
Untested; for tested code, provide example data in data steps with datalines.
Here is two different approaches with the same result
/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A
002 55-60 B
003 35-50 C
004 35-50 D
005 35-50 E
;
data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21
003 456 54
005 45 0
;
/* Proc SQL Approach */
proc sql;
create table want as
select set1.*
from set1, set2
where set1.ID = set2.ID
;
quit;
/* Data Step Hash Approach */
data want;
if _N_ = 1 then do;
dcl hash h(dataset : 'set2');
h.definekey('ID');
h.definedone();
end;
set set1;
if h.check() = 0;
run;
/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A
002 55-60 B
003 35-50 C
004 35-50 D
005 35-50 E
;
data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21
003 456 54
005 45 0
;
/* Proc SQL Approach */
proc sql;
create table want as
select set1.*
from set1
where ID in (select ID from set2)
;
quit;
Are the data sorted? If so, and if data set 2 never has more than one observation per id, then:
data want;
merge ds1 (in=in1) ds2 (keep=id in=in2);
by id;
if in1=in2;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.