BookmarkSubscribeRSS Feed
ybz12003
Rhodochrosite | Level 12

Hi Experts,

 

I have one large dataset A, and one small dataset B.   Both dataset contain some same sample names, birthdates, tests.    As long as both sets have same names / dates / tests,   I would like to remove those from set A ( the bigger dataset).   Please advice how.   Thank you.

5 REPLIES 5
mbuchecker
Quartz | Level 8

proc sql; 

select name, birthdate, test

  from a

except

select name, birthdate, test

  from b;

 

should do it.

Michelle
ballardw
Super User

@ybz12003 wrote:

Hi Experts,

 

I have one large dataset A, and one small dataset B.   Both dataset contain some same sample names, birthdates, tests.    As long as both sets have same names / dates / tests,   I would like to remove those from set A ( the bigger dataset).   Please advice how.   Thank you.


Getting just the variables you mention:

 

proc sql;
   create table want as
   select name, birthdate, tests from dataseta
   except
   select name, birthdate, tests from datasetb
   ;
quit;

If there are other variables in set A that you need you need to joint that back to set A

ybz12003
Rhodochrosite | Level 12

Dataset A contains 50 columns, and dataset B contains 3 columns.  My final result is dataset A contains 50 columns WITHOUT dataset B.

Patrick
Opal | Level 21

Here two ways for doing this.

If you don't create datasetA within your program then I wouldn't overwrite this table but would create a copy of it (another table) with the rows removed.

/* create sample data */
data datasetA datasetB(keep=name birthdate tests);
  set sashelp.class;
  birthdate=today(); 
  format birthdate date9.;
  tests=1;
  output datasetA;
  if mod(_n_,2)=0 then output datasetB;
run;

/* delete rows from datasetA in place
   - deletes rows LOGICALLY only
   - Does NOT reduce table size
   - maintains table attributes like indexes
*/
proc sql;
  delete from datasetA as A
  where exists
    ( select * from datasetB as B
      where 
        a.name=b.name
        and a.birthdate=b.birthdate
        and a.tests=b.tests
    )
   ;
quit;

proc contents data=datasetA;
run;quit;

/* re-create the table with matching rows removed 
   - fully re-creates the table
   - reduces table size
   - will not maintain other table attributes like indexes
*/
data datasetA;
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'datasetB');
      h1.defineKey('name','birthdate','tests');
      h1.defineDone();
    end;
  set datasetA;
  if h1.check()=0 then delete;
run;

proc contents data=datasetA;
run;quit;

 

Satish_Parida
Lapis Lazuli | Level 10

Here is how to do in hash.

 

data have1;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
CC 11Jan2010 X
DD 12Jan2010 X
;
run;

data have2;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
;
run;

data want;
set have1;
if _n_=1 then do;
	declare hash rc (dataset: 'work.have2');
	rc.DefineKey ( 'names','birthdates','tests' ) ;
	rc.DefineDone () ; 
end;
if rc.find() =0 then delete;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 553 views
  • 1 like
  • 5 in conversation