Hi,
I have a table A:
User_ID | Customer_ID | Email_add | Invoice_no | Invoice_account |
u1 | c1 | example1@gmail.com | v1 | 100 |
u2 | c2 | example2@gmail.com | v2 | 120 |
u2 | c2 | example2b@gmail.com | v3 | 130 |
u3 | c3 | example3@gmail.com | v4 | 150 |
u4 | c4 | example4@gmail.com | v5 | 165 |
u5 | c5 | example5@gmail.com | v6 | 180 |
And table B:
User_ID | Customer_ID | Email_add |
c1 | example1@gmail.com | |
example2@gmail.com | ||
u3 |
I wish to delete the whole entry when either User_ID, Customer_ID or Email_add appeared in table B.
So the result table C should look like this:
User_ID | Customer_ID | Email_add | Invoice_no | Invoice_account |
u4 | c4 | example4@gmail.com | v5 | 165 |
u5 | c5 | example5@gmail.com | v6 | 180 |
I've tried using where clause A.User_ID=B.User_ID or A.Customer_ID=B.Customer_ID or A.Email_add=B.Email_add to delete the rows (failed T.T.). This operation is very slow when the datasets for table A and Table B is large.
Please advise, thanks!
Try this
data a;
input User_ID $ Customer_ID $ Email_add :$20. Invoice_no $ Invoice_account;
datalines;
u1 c1 example1@gmail.com v1 100
u2 c2 example2@gmail.com v2 120
u2 c2 example2b@gmail.com v3 130
u3 c3 example3@gmail.com v4 150
u4 c4 example4@gmail.com v5 165
u5 c5 example5@gmail.com v6 180
;
data b;
input User_ID $ Customer_ID $ Email_add :$20.;
infile datalines missover dlm = '|';
datalines;
| c1 | example1@gmail.com
| | example2@gmail.com
u3 | |
;
proc sql;
create table want as
select * from a
where User_ID not in (select User_ID from b)
and Customer_ID not in (select Customer_ID from b)
and Email_add not in (select Email_add from b)
;
quit;
Try this
data a;
input User_ID $ Customer_ID $ Email_add :$20. Invoice_no $ Invoice_account;
datalines;
u1 c1 example1@gmail.com v1 100
u2 c2 example2@gmail.com v2 120
u2 c2 example2b@gmail.com v3 130
u3 c3 example3@gmail.com v4 150
u4 c4 example4@gmail.com v5 165
u5 c5 example5@gmail.com v6 180
;
data b;
input User_ID $ Customer_ID $ Email_add :$20.;
infile datalines missover dlm = '|';
datalines;
| c1 | example1@gmail.com
| | example2@gmail.com
u3 | |
;
proc sql;
create table want as
select * from a
where User_ID not in (select User_ID from b)
and Customer_ID not in (select Customer_ID from b)
and Email_add not in (select Email_add from b)
;
quit;
@PeterClemmensen , thank you!
Also, the 3rd obs should be in the desired result as well, correct?
You could use the EXIST clause.
/* create a new table */
proc sql;
create table want as
select * from a
where not exists
( select * from b
where
a.user_id=b.user_id or
a.customer_id=b.customer_id or
a.email_add=b.email_add
)
;
quit;
/* delete from existing table in-place */
proc sql;
delete from a
where not exists
( select * from b
where
a.user_id=b.user_id or
a.customer_id=b.customer_id or
a.email_add=b.email_add
)
;
quit;
If you're dealing with SAS tables then it's normally better to create a new table and not to directly delete rows in an existing table ....unless you have a big source table and need only to delete a minor percentage of rows. Reason: SAS deleted the rows only logically but no physically. The deleted rows still remain in the table and add to the tables volume (file size).
proc contents data=a;
quit;
If it's a table in a database then using DELETE is likely the "correct" approach. Having said that: Delete is also in a database a slow process so if you need to delete a high percentage of rows then creating a new table will likely execute faster.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.