Solved: Re: Match and remove rows

NewUsrStat · Posted 08-12-2022 04:45 AM

Hi guys, is there a way to remove rows in a file that are not present in a second file by matching the column ID?

Thank you in advance

For example suppose to have:

data set1

ID AGE Role

001 35-50 A

002 55-60 B

003 35-50 C

004 35-50 D

005 35-50 E

and data set2

ID Hours Vacancies

002 900 21

003 456 54

005 45 0

I would like the data set1 to become:

ID AGE Role

002 55-60 B

003 35-50 C

005 35-50 E

Kurt_Bremser · Posted 08-12-2022 05:14 AM

Use a hash:

data want;
set ds1;
if _n_ = 1
then do;
  declare hash ds2 (dataset:"ds2");
  ds2.definekey("id");
  ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;

Untested; for tested code, provide example data in data steps with datalines.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PeterClemmensen · Posted 08-12-2022 04:52 AM

Yes there is. Please show us some data. Makes it much easier to provide a usable code answer.

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 08-12-2022 05:14 AM

Use a hash:

data want;
set ds1;
if _n_ = 1
then do;
  declare hash ds2 (dataset:"ds2");
  ds2.definekey("id");
  ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;

Untested; for tested code, provide example data in data steps with datalines.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 08-12-2022 06:27 AM

Here is two different approaches with the same result

/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A 
002 55-60 B 
003 35-50 C 
004 35-50 D 
005 35-50 E 
;

data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21 
003 456 54 
005 45  0  
;

/* Proc SQL Approach */
proc sql;
   create table want as
   select set1.*
   from set1, set2
   where set1.ID = set2.ID
   ;
quit;

/* Data Step Hash Approach */
data want;
   if _N_ = 1 then do;
      dcl hash h(dataset : 'set2');
      h.definekey('ID');
      h.definedone();
   end;

   set set1;

   if h.check() = 0;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Ksharp · Posted 08-12-2022 08:06 AM

/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A 
002 55-60 B 
003 35-50 C 
004 35-50 D 
005 35-50 E 
;

data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21 
003 456 54 
005 45  0  
;

/* Proc SQL Approach */
proc sql;
   create table want as
   select set1.*
   from set1
   where ID in (select ID from set2)
   ;
quit;

mkeintz · Posted 08-12-2022 08:23 AM

Are the data sorted? If so, and if data set 2 never has more than one observation per id, then:

data want;
  merge ds1 (in=in1)    ds2 (keep=id  in=in2);
  by id;
  if in1=in2;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------