Hi Experts,
I have one large dataset A, and one small dataset B. Both dataset contain some same sample names, birthdates, tests. As long as both sets have same names / dates / tests, I would like to remove those from set A ( the bigger dataset). Please advice how. Thank you.
proc sql;
select name, birthdate, test
from a
except
select name, birthdate, test
from b;
should do it.
@ybz12003 wrote:
Hi Experts,
I have one large dataset A, and one small dataset B. Both dataset contain some same sample names, birthdates, tests. As long as both sets have same names / dates / tests, I would like to remove those from set A ( the bigger dataset). Please advice how. Thank you.
Getting just the variables you mention:
proc sql; create table want as select name, birthdate, tests from dataseta except select name, birthdate, tests from datasetb ; quit;
If there are other variables in set A that you need you need to joint that back to set A
Dataset A contains 50 columns, and dataset B contains 3 columns. My final result is dataset A contains 50 columns WITHOUT dataset B.
Here two ways for doing this.
If you don't create datasetA within your program then I wouldn't overwrite this table but would create a copy of it (another table) with the rows removed.
/* create sample data */
data datasetA datasetB(keep=name birthdate tests);
set sashelp.class;
birthdate=today();
format birthdate date9.;
tests=1;
output datasetA;
if mod(_n_,2)=0 then output datasetB;
run;
/* delete rows from datasetA in place
- deletes rows LOGICALLY only
- Does NOT reduce table size
- maintains table attributes like indexes
*/
proc sql;
delete from datasetA as A
where exists
( select * from datasetB as B
where
a.name=b.name
and a.birthdate=b.birthdate
and a.tests=b.tests
)
;
quit;
proc contents data=datasetA;
run;quit;
/* re-create the table with matching rows removed
- fully re-creates the table
- reduces table size
- will not maintain other table attributes like indexes
*/
data datasetA;
if _n_=1 then
do;
dcl hash h1(dataset:'datasetB');
h1.defineKey('name','birthdate','tests');
h1.defineDone();
end;
set datasetA;
if h1.check()=0 then delete;
run;
proc contents data=datasetA;
run;quit;
Here is how to do in hash.
data have1;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
CC 11Jan2010 X
DD 12Jan2010 X
;
run;
data have2;
input names $ birthdates:date9. tests $;
format birthdates date9.;
cards;
AA 09Jan2010 X
BB 10Jan2010 Y
;
run;
data want;
set have1;
if _n_=1 then do;
declare hash rc (dataset: 'work.have2');
rc.DefineKey ( 'names','birthdates','tests' ) ;
rc.DefineDone () ;
end;
if rc.find() =0 then delete;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.