BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Pyrite | Level 9

Hi guys, is there a way to remove rows in a file that are not present in a second file by matching the column ID? 

 

Thank you in advance

 

For example suppose to have: 

data set1

 

 ID            AGE               Role    

001          35-50                A    

002          55-60                B    

003          35-50                C    

004          35-50                D    

005          35-50                E    

 

and data set2

 ID            Hours            Vacancies    

 002           900                    21   

 003           456                    54    

 005            45                      0   

 

I would like the data set1 to become: 

 

ID           AGE               Role    

002        55-60                B    

003        35-50                C    

005        35-50                E    

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Use a hash:

data want;
set ds1;
if _n_ = 1
then do;
  declare hash ds2 (dataset:"ds2");
  ds2.definekey("id");
  ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;

Untested; for tested code, provide example data in data steps with datalines.

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Yes there is. Please show us some data. Makes it much easier to provide a usable code answer. 

Kurt_Bremser
Super User

Use a hash:

data want;
set ds1;
if _n_ = 1
then do;
  declare hash ds2 (dataset:"ds2");
  ds2.definekey("id");
  ds2.definedone();
end;
if ds2.check() = 0; /* CHECK method returns zero if successful */
run;

Untested; for tested code, provide example data in data steps with datalines.

PeterClemmensen
Tourmaline | Level 20

Here is two different approaches with the same result

 

/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A 
002 55-60 B 
003 35-50 C 
004 35-50 D 
005 35-50 E 
;

data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21 
003 456 54 
005 45  0  
;

/* Proc SQL Approach */
proc sql;
   create table want as
   select set1.*
   from set1, set2
   where set1.ID = set2.ID
   ;
quit;

/* Data Step Hash Approach */
data want;
   if _N_ = 1 then do;
      dcl hash h(dataset : 'set2');
      h.definekey('ID');
      h.definedone();
   end;

   set set1;

   if h.check() = 0;
run;
Ksharp
Super User
/* Example Data */
data set1;
input ID $ AGE $ Role $;
datalines;
001 35-50 A 
002 55-60 B 
003 35-50 C 
004 35-50 D 
005 35-50 E 
;

data set2;
input ID $ Hours Vacancies;
datalines;
002 900 21 
003 456 54 
005 45  0  
;

/* Proc SQL Approach */
proc sql;
   create table want as
   select set1.*
   from set1
   where ID in (select ID from set2)
   ;
quit;
mkeintz
PROC Star

Are the data sorted?  If so, and if data set 2 never has more than one observation per id, then:

 

data want;
  merge ds1 (in=in1)    ds2 (keep=id  in=in2);
  by id;
  if in1=in2;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 745 views
  • 1 like
  • 5 in conversation