Hi Experts,
I have one large dataset A, and one small dataset B. Both dataset contain some same sample names, birthdates, tests. As long as both sets have same names / dates / tests, I would like to remove those from set A ( the bigger dataset). Please advice how. Thank you.
proc sql;
select name, birthdate, test
from a
except
select name, birthdate, test
from b;
should do it.
@ybz12003 wrote:
Hi Experts,
I have one large dataset A, and one small dataset B. Both dataset contain some same sample names, birthdates, tests. As long as both sets have same names / dates / tests, I would like to remove those from set A ( the bigger dataset). Please advice how. Thank you.
Getting just the variables you mention:
proc sql; create table want as select name, birthdate, tests from dataseta except select name, birthdate, tests from datasetb ; quit;
If there are other variables in set A that you need you need to joint that back to set A
Dataset A contains 50 columns, and dataset B contains 3 columns. My final result is dataset A contains 50 columns WITHOUT dataset B.
Here two ways for doing this.
If you don't create datasetA within your program then I wouldn't overwrite this table but would create a copy of it (another table) with the rows removed.
/* create sample data */
data datasetA datasetB(keep=name birthdate tests);
set sashelp.class;
birthdate=today();
format birthdate date9.;
tests=1;
output datasetA;
if mod(_n_,2)=0 then output datasetB;
run;
/* delete rows from datasetA in place
- deletes rows LOGICALLY only
- Does NOT reduce table size
- maintains table attributes like indexes
*/
proc sql;
delete from datasetA as A
where exists
( select * from datasetB as B
where
a.name=b.name
and a.birthdate=b.birthdate
and a.tests=b.tests
)
;
quit;
proc contents data=datasetA;
run;quit;
/* re-create the table with matching rows removed
- fully re-creates the table
- reduces table size
- will not maintain other table attributes like indexes
*/
data datasetA;
if _n_=1 then
do;
dcl hash h1(dataset:'datasetB');
h1.defineKey('name','birthdate','tests');
h1.defineDone();
end;
set datasetA;
if h1.check()=0 then delete;
run;
proc contents data=datasetA;
run;quit;
Here is how to do in hash.
data have1;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
CC 11Jan2010 X
DD 12Jan2010 X
;
run;
data have2;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
;
run;
data want;
set have1;
if _n_=1 then do;
declare hash rc (dataset: 'work.have2');
rc.DefineKey ( 'names','birthdates','tests' ) ;
rc.DefineDone () ;
end;
if rc.find() =0 then delete;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.