BookmarkSubscribeRSS Feed
ybz12003
Rhodochrosite | Level 12

Hi Experts,

 

I have one large dataset A, and one small dataset B.   Both dataset contain some same sample names, birthdates, tests.    As long as both sets have same names / dates / tests,   I would like to remove those from set A ( the bigger dataset).   Please advice how.   Thank you.

5 REPLIES 5
mbuchecker
Quartz | Level 8

proc sql; 

select name, birthdate, test

  from a

except

select name, birthdate, test

  from b;

 

should do it.

Michelle
ballardw
Super User

@ybz12003 wrote:

Hi Experts,

 

I have one large dataset A, and one small dataset B.   Both dataset contain some same sample names, birthdates, tests.    As long as both sets have same names / dates / tests,   I would like to remove those from set A ( the bigger dataset).   Please advice how.   Thank you.


Getting just the variables you mention:

 

proc sql;
   create table want as
   select name, birthdate, tests from dataseta
   except
   select name, birthdate, tests from datasetb
   ;
quit;

If there are other variables in set A that you need you need to joint that back to set A

ybz12003
Rhodochrosite | Level 12

Dataset A contains 50 columns, and dataset B contains 3 columns.  My final result is dataset A contains 50 columns WITHOUT dataset B.

Patrick
Opal | Level 21

Here two ways for doing this.

If you don't create datasetA within your program then I wouldn't overwrite this table but would create a copy of it (another table) with the rows removed.

/* create sample data */
data datasetA datasetB(keep=name birthdate tests);
  set sashelp.class;
  birthdate=today(); 
  format birthdate date9.;
  tests=1;
  output datasetA;
  if mod(_n_,2)=0 then output datasetB;
run;

/* delete rows from datasetA in place
   - deletes rows LOGICALLY only
   - Does NOT reduce table size
   - maintains table attributes like indexes
*/
proc sql;
  delete from datasetA as A
  where exists
    ( select * from datasetB as B
      where 
        a.name=b.name
        and a.birthdate=b.birthdate
        and a.tests=b.tests
    )
   ;
quit;

proc contents data=datasetA;
run;quit;

/* re-create the table with matching rows removed 
   - fully re-creates the table
   - reduces table size
   - will not maintain other table attributes like indexes
*/
data datasetA;
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'datasetB');
      h1.defineKey('name','birthdate','tests');
      h1.defineDone();
    end;
  set datasetA;
  if h1.check()=0 then delete;
run;

proc contents data=datasetA;
run;quit;

 

Satish_Parida
Lapis Lazuli | Level 10

Here is how to do in hash.

 

data have1;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
CC 11Jan2010 X
DD 12Jan2010 X
;
run;

data have2;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
;
run;

data want;
set have1;
if _n_=1 then do;
	declare hash rc (dataset: 'work.have2');
	rc.DefineKey ( 'names','birthdates','tests' ) ;
	rc.DefineDone () ; 
end;
if rc.find() =0 then delete;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 835 views
  • 1 like
  • 5 in conversation