Hi,
I have a table A:
User_ID | Customer_ID | Email_add | Invoice_no | Invoice_account |
u1 | c1 | example1@gmail.com | v1 | 100 |
u2 | c2 | example2@gmail.com | v2 | 120 |
u2 | c2 | example2b@gmail.com | v3 | 130 |
u3 | c3 | example3@gmail.com | v4 | 150 |
u4 | c4 | example4@gmail.com | v5 | 165 |
u5 | c5 | example5@gmail.com | v6 | 180 |
And table B:
User_ID | Customer_ID | Email_add |
c1 | example1@gmail.com | |
example2@gmail.com | ||
u3 |
I wish to delete the whole entry when either User_ID, Customer_ID or Email_add appeared in table B.
So the result table C should look like this:
User_ID | Customer_ID | Email_add | Invoice_no | Invoice_account |
u4 | c4 | example4@gmail.com | v5 | 165 |
u5 | c5 | example5@gmail.com | v6 | 180 |
I've tried using where clause A.User_ID=B.User_ID or A.Customer_ID=B.Customer_ID or A.Email_add=B.Email_add to delete the rows (failed T.T.). This operation is very slow when the datasets for table A and Table B is large.
Please advise, thanks!
Try this
data a;
input User_ID $ Customer_ID $ Email_add :$20. Invoice_no $ Invoice_account;
datalines;
u1 c1 example1@gmail.com v1 100
u2 c2 example2@gmail.com v2 120
u2 c2 example2b@gmail.com v3 130
u3 c3 example3@gmail.com v4 150
u4 c4 example4@gmail.com v5 165
u5 c5 example5@gmail.com v6 180
;
data b;
input User_ID $ Customer_ID $ Email_add :$20.;
infile datalines missover dlm = '|';
datalines;
| c1 | example1@gmail.com
| | example2@gmail.com
u3 | |
;
proc sql;
create table want as
select * from a
where User_ID not in (select User_ID from b)
and Customer_ID not in (select Customer_ID from b)
and Email_add not in (select Email_add from b)
;
quit;
Try this
data a;
input User_ID $ Customer_ID $ Email_add :$20. Invoice_no $ Invoice_account;
datalines;
u1 c1 example1@gmail.com v1 100
u2 c2 example2@gmail.com v2 120
u2 c2 example2b@gmail.com v3 130
u3 c3 example3@gmail.com v4 150
u4 c4 example4@gmail.com v5 165
u5 c5 example5@gmail.com v6 180
;
data b;
input User_ID $ Customer_ID $ Email_add :$20.;
infile datalines missover dlm = '|';
datalines;
| c1 | example1@gmail.com
| | example2@gmail.com
u3 | |
;
proc sql;
create table want as
select * from a
where User_ID not in (select User_ID from b)
and Customer_ID not in (select Customer_ID from b)
and Email_add not in (select Email_add from b)
;
quit;
@PeterClemmensen , thank you!
Also, the 3rd obs should be in the desired result as well, correct?
You could use the EXIST clause.
/* create a new table */
proc sql;
create table want as
select * from a
where not exists
( select * from b
where
a.user_id=b.user_id or
a.customer_id=b.customer_id or
a.email_add=b.email_add
)
;
quit;
/* delete from existing table in-place */
proc sql;
delete from a
where not exists
( select * from b
where
a.user_id=b.user_id or
a.customer_id=b.customer_id or
a.email_add=b.email_add
)
;
quit;
If you're dealing with SAS tables then it's normally better to create a new table and not to directly delete rows in an existing table ....unless you have a big source table and need only to delete a minor percentage of rows. Reason: SAS deleted the rows only logically but no physically. The deleted rows still remain in the table and add to the tables volume (file size).
proc contents data=a;
quit;
If it's a table in a database then using DELETE is likely the "correct" approach. Having said that: Delete is also in a database a slow process so if you need to delete a high percentage of rows then creating a new table will likely execute faster.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.