BookmarkSubscribeRSS Feed
ybz12003
Rhodochrosite | Level 12

Hi Experts,

 

I have one large dataset A, and one small dataset B.   Both dataset contain some same sample names, birthdates, tests.    As long as both sets have same names / dates / tests,   I would like to remove those from set A ( the bigger dataset).   Please advice how.   Thank you.

5 REPLIES 5
mbuchecker
Quartz | Level 8

proc sql; 

select name, birthdate, test

  from a

except

select name, birthdate, test

  from b;

 

should do it.

Michelle
ballardw
Super User

@ybz12003 wrote:

Hi Experts,

 

I have one large dataset A, and one small dataset B.   Both dataset contain some same sample names, birthdates, tests.    As long as both sets have same names / dates / tests,   I would like to remove those from set A ( the bigger dataset).   Please advice how.   Thank you.


Getting just the variables you mention:

 

proc sql;
   create table want as
   select name, birthdate, tests from dataseta
   except
   select name, birthdate, tests from datasetb
   ;
quit;

If there are other variables in set A that you need you need to joint that back to set A

ybz12003
Rhodochrosite | Level 12

Dataset A contains 50 columns, and dataset B contains 3 columns.  My final result is dataset A contains 50 columns WITHOUT dataset B.

Patrick
Opal | Level 21

Here two ways for doing this.

If you don't create datasetA within your program then I wouldn't overwrite this table but would create a copy of it (another table) with the rows removed.

/* create sample data */
data datasetA datasetB(keep=name birthdate tests);
  set sashelp.class;
  birthdate=today(); 
  format birthdate date9.;
  tests=1;
  output datasetA;
  if mod(_n_,2)=0 then output datasetB;
run;

/* delete rows from datasetA in place
   - deletes rows LOGICALLY only
   - Does NOT reduce table size
   - maintains table attributes like indexes
*/
proc sql;
  delete from datasetA as A
  where exists
    ( select * from datasetB as B
      where 
        a.name=b.name
        and a.birthdate=b.birthdate
        and a.tests=b.tests
    )
   ;
quit;

proc contents data=datasetA;
run;quit;

/* re-create the table with matching rows removed 
   - fully re-creates the table
   - reduces table size
   - will not maintain other table attributes like indexes
*/
data datasetA;
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'datasetB');
      h1.defineKey('name','birthdate','tests');
      h1.defineDone();
    end;
  set datasetA;
  if h1.check()=0 then delete;
run;

proc contents data=datasetA;
run;quit;

 

Satish_Parida
Lapis Lazuli | Level 10

Here is how to do in hash.

 

data have1;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
CC 11Jan2010 X
DD 12Jan2010 X
;
run;

data have2;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
;
run;

data want;
set have1;
if _n_=1 then do;
	declare hash rc (dataset: 'work.have2');
	rc.DefineKey ( 'names','birthdates','tests' ) ;
	rc.DefineDone () ; 
end;
if rc.find() =0 then delete;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1350 views
  • 1 like
  • 5 in conversation