Solved: Re: Delete an entry if one of the column values appeared in other tabl...

sarahzhou · Posted 10-12-2022 02:58 AM

Hi,

I have a table A:

User_ID	Customer_ID	Email_add	Invoice_no	Invoice_account
u1	c1	example1@gmail.com	v1	100
u2	c2	example2@gmail.com	v2	120
u2	c2	example2b@gmail.com	v3	130
u3	c3	example3@gmail.com	v4	150
u4	c4	example4@gmail.com	v5	165
u5	c5	example5@gmail.com	v6	180

And table B:

User_ID	Customer_ID	Email_add
	c1	example1@gmail.com
		example2@gmail.com
u3

I wish to delete the whole entry when either User_ID, Customer_ID or Email_add appeared in table B.

So the result table C should look like this:

User_ID	Customer_ID	Email_add	Invoice_no	Invoice_account
u4	c4	example4@gmail.com	v5	165
u5	c5	example5@gmail.com	v6	180

I've tried using where clause A.User_ID=B.User_ID or A.Customer_ID=B.Customer_ID or A.Email_add=B.Email_add to delete the rows (failed T.T.). This operation is very slow when the datasets for table A and Table B is large.

Please advise, thanks!

PeterClemmensen · Posted 10-12-2022 03:15 AM

Try this

data a;
input User_ID $ Customer_ID $ Email_add :$20. Invoice_no $ Invoice_account;
datalines;
u1 c1 example1@gmail.com  v1 100 
u2 c2 example2@gmail.com  v2 120 
u2 c2 example2b@gmail.com v3 130 
u3 c3 example3@gmail.com  v4 150 
u4 c4 example4@gmail.com  v5 165 
u5 c5 example5@gmail.com  v6 180 
;
 
data b;
input User_ID $ Customer_ID $ Email_add :$20.;
infile datalines missover dlm = '|';
datalines;
   | c1 | example1@gmail.com 
   |    | example2@gmail.com 
u3 |    |                    
;   


proc sql;
   create table want as
   select * from a
   where User_ID     not in (select User_ID     from b)
     and Customer_ID not in (select Customer_ID from b)
     and Email_add   not in (select Email_add   from b)
   ;
quit;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 10-12-2022 03:15 AM

Try this

data a;
input User_ID $ Customer_ID $ Email_add :$20. Invoice_no $ Invoice_account;
datalines;
u1 c1 example1@gmail.com  v1 100 
u2 c2 example2@gmail.com  v2 120 
u2 c2 example2b@gmail.com v3 130 
u3 c3 example3@gmail.com  v4 150 
u4 c4 example4@gmail.com  v5 165 
u5 c5 example5@gmail.com  v6 180 
;
 
data b;
input User_ID $ Customer_ID $ Email_add :$20.;
infile datalines missover dlm = '|';
datalines;
   | c1 | example1@gmail.com 
   |    | example2@gmail.com 
u3 |    |                    
;   


proc sql;
   create table want as
   select * from a
   where User_ID     not in (select User_ID     from b)
     and Customer_ID not in (select Customer_ID from b)
     and Email_add   not in (select Email_add   from b)
   ;
quit;

The DATA to DATA Step Macro
Blog: SASnrd

sarahzhou · Posted 10-12-2022 10:54 PM

@PeterClemmensen , thank you!

PeterClemmensen · Posted 10-12-2022 03:19 AM

Also, the 3rd obs should be in the desired result as well, correct?

The DATA to DATA Step Macro
Blog: SASnrd

Patrick · Posted 10-12-2022 04:02 AM

You could use the EXIST clause.

/* create a new table */
proc sql;
  create table want as
    select * from a
    where not exists
      ( select * from b 
        where 
          a.user_id=b.user_id or
          a.customer_id=b.customer_id or
          a.email_add=b.email_add
      )
    ;
quit;

/* delete from existing table in-place */
proc sql;
  delete from a 
    where not exists
      ( select * from b 
        where 
          a.user_id=b.user_id or
          a.customer_id=b.customer_id or
          a.email_add=b.email_add
      )
    ;
quit;

If you're dealing with SAS tables then it's normally better to create a new table and not to directly delete rows in an existing table ....unless you have a big source table and need only to delete a minor percentage of rows. Reason: SAS deleted the rows only logically but no physically. The deleted rows still remain in the table and add to the tables volume (file size).

proc contents data=a;
quit;

If it's a table in a database then using DELETE is likely the "correct" approach. Having said that: Delete is also in a database a slow process so if you need to delete a high percentage of rows then creating a new table will likely execute faster.

Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Re: Delete an entry if one of the column values appeared in other table

Click image to register for webinar

Classroom Training Available!