I have a problem I'm trying to solve, and not really sure where to start.
I have two tables Table A and Table B. I'm trying to delete ALL of the customer IDs from Table A if they appear in Table B, but Table A sometimes has multiple entries for the same customer ID. My end result should be Table A with only records that are not in Table B.
Example:
Table A
Customer ID | Date |
A | 5/7/2020 |
A | 5/11/2020 |
B | 5/8/2020 |
B | 5/9/2020 |
B | 5/10/2020 |
C | 5/11/2020 |
Table B
Customer ID | Delete Flag |
B | Y |
B | Y |
A | Y |
In this example I would only want Customer ID C to show up in my final table.
I have a few thousand rows of data in Table A that I'm checking against. Any help is greatly appreciated!
This is relatively straightforward with a subquery like this:
(I have altered the name of the variable "Customer ID". It can work as a named literal like this: 'Customer ID'n
proc sql;
create table want as
select * from A
where customer_ID not in (select customer_ID from B);
quit;
This is relatively straightforward with a subquery like this:
(I have altered the name of the variable "Customer ID". It can work as a named literal like this: 'Customer ID'n
proc sql;
create table want as
select * from A
where customer_ID not in (select customer_ID from B);
quit;
Thank you! Super easy way to do this, not sure how I didn't think of this.
If data size and performance is an issue, use a hash object:
data want;
set a;
if _n_ 01
then do;
declare hash b (dataset:"b");
b.definekey("customer_id");
b.definedone();
end;
if b.check() ne 0;
run;
proc sort data=tableB out=b_unique nodupkey;
by customerID;
run;
proc sort data=tableA;
by customerID;
data tableC;
merge tableA(in=ina) b_unique(in=inb drop=deleteflag);
by customerID;
if not inb;
run;
A SQL way that will delete records from the table - test thoroughly before using something like this:
data tableA;
infile cards dlm='09'x;
input CustomerID $ Date : ddmmyy10.;
format date date9.;
cards;
A 5/7/2020
A 5/11/2020
B 5/8/2020
B 5/9/2020
B 5/10/2020
C 5/11/2020
;;;;
run;
data tableB;
infile cards dlm='09'x;
input CustomerID $ DeleteFlag $;
cards;
B Y
B Y
A Y
;;;;
run;
proc sql;
delete * from tableA where customerID in (select distinct(CustomerID) from TableB);
quit;
Thank you!
This step below worked as well.
proc sort data=tableB out=b_unique nodupkey;
by customerID;
run;
proc sort data=tableA;
by customerID;
data tableC;
merge tableA(in=ina) b_unique(in=inb drop=deleteflag);
by customerID;
if not inb;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.