BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, is there a way to remove rows in a file that are not present in a second file by matching the column ID? 

 

Thank you in advance

 

For example suppose to have: 

data set1

 

 ID            AGE               Role    

001          35-50                A    

002          55-60                B    

003          35-50                C    

004          35-50                D    

005          35-50                E    

 

and data set2

 ID            Hours            Vacancies    

 002           900                    21   

 003           456                    54    

 005            45                      0   

 

I would like the data set1 to become: 

 

ID           AGE               Role    

002        55-60                B    

003        35-50                C    

005        35-50                E    

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Use a hash:

data want;
set ds1;
if _n_ = 1
then do;
  declare hash ds2 (dataset:"ds2");
  ds2.definekey("id");
  ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;

Untested; for tested code, provide example data in data steps with datalines.

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Yes there is. Please show us some data. Makes it much easier to provide a usable code answer. 

Kurt_Bremser
Super User

Use a hash:

data want;
set ds1;
if _n_ = 1
then do;
  declare hash ds2 (dataset:"ds2");
  ds2.definekey("id");
  ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;

Untested; for tested code, provide example data in data steps with datalines.

PeterClemmensen
Tourmaline | Level 20

Here is two different approaches with the same result

 

/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A 
002 55-60 B 
003 35-50 C 
004 35-50 D 
005 35-50 E 
;

data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21 
003 456 54 
005 45  0  
;

/* Proc SQL Approach */
proc sql;
   create table want as
   select set1.*
   from set1, set2
   where set1.ID = set2.ID
   ;
quit;

/* Data Step Hash Approach */
data want;
   if _N_ = 1 then do;
      dcl hash h(dataset : 'set2');
      h.definekey('ID');
      h.definedone();
   end;

   set set1;

   if h.check() = 0;
run;
Ksharp
Super User
/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A 
002 55-60 B 
003 35-50 C 
004 35-50 D 
005 35-50 E 
;

data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21 
003 456 54 
005 45  0  
;

/* Proc SQL Approach */
proc sql;
   create table want as
   select set1.*
   from set1
   where ID in (select ID from set2)
   ;
quit;
mkeintz
PROC Star

Are the data sorted?  If so, and if data set 2 never has more than one observation per id, then:

 

data want;
  merge ds1 (in=in1)    ds2 (keep=id  in=in2);
  by id;
  if in1=in2;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------