- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi guys, is there a way to remove rows in a file that are not present in a second file by matching the column ID?
Thank you in advance
For example suppose to have:
data set1
ID AGE Role
001 35-50 A
002 55-60 B
003 35-50 C
004 35-50 D
005 35-50 E
and data set2
ID Hours Vacancies
002 900 21
003 456 54
005 45 0
I would like the data set1 to become:
ID AGE Role
002 55-60 B
003 35-50 C
005 35-50 E
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use a hash:
data want;
set ds1;
if _n_ = 1
then do;
declare hash ds2 (dataset:"ds2");
ds2.definekey("id");
ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;
Untested; for tested code, provide example data in data steps with datalines.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes there is. Please show us some data. Makes it much easier to provide a usable code answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use a hash:
data want;
set ds1;
if _n_ = 1
then do;
declare hash ds2 (dataset:"ds2");
ds2.definekey("id");
ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;
Untested; for tested code, provide example data in data steps with datalines.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is two different approaches with the same result
/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A
002 55-60 B
003 35-50 C
004 35-50 D
005 35-50 E
;
data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21
003 456 54
005 45 0
;
/* Proc SQL Approach */
proc sql;
create table want as
select set1.*
from set1, set2
where set1.ID = set2.ID
;
quit;
/* Data Step Hash Approach */
data want;
if _N_ = 1 then do;
dcl hash h(dataset : 'set2');
h.definekey('ID');
h.definedone();
end;
set set1;
if h.check() = 0;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A
002 55-60 B
003 35-50 C
004 35-50 D
005 35-50 E
;
data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21
003 456 54
005 45 0
;
/* Proc SQL Approach */
proc sql;
create table want as
select set1.*
from set1
where ID in (select ID from set2)
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Are the data sorted? If so, and if data set 2 never has more than one observation per id, then:
data want;
merge ds1 (in=in1) ds2 (keep=id in=in2);
by id;
if in1=in2;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------