BookmarkSubscribeRSS Feed
helloSAS
Obsidian | Level 7

hi,

I would like to join table A with Table B on three different columns and any matching records from Table B should be deleted in table A. In the below code I've written the records in data set NEW should be removed from Cons3. how can i do it?

proc sql;
create table home as
select a.tlnbr,  b.home, b.busi, b.alt, b.ACCT_ID
from acct_dtl as a
inner join cons3 as b on a.tlnbr = b.home;
quit;

proc sql;
create table busi as
select a.tlnbr,  b.home, b.busi, b.alt, b.ACCT_ID
from acct_dtl as a
inner join cons3 as b on a.tlnbr = b.busi;
quit;

proc sql;
create table alt as
select a.tlnbr,  b.home, b.busi, b.alt, b.ACCT_ID
from acct_dtl as a
inner join cons3 as b on a.tlnbr = b.alt;
quit;

data new;
set home busi alt;
run;

10 REPLIES 10
Reeza
Super User

You have three different joins. Can you post sample input data and sample output data?

Fugue
Quartz | Level 8

I believe the basic syntax would be like this. Recursively referencing the target may create data integrity problems; hence, an intermediate table or view.

proc sql ;

     create table temp as

     select distinct t1.tlnbr 

          from acct_dtl t1,

               cons3 t2

          where t1.tlnbr = t2.home

               or t1.tlnbr = t2.busi

               or t1.tlnbr = t2.alt

;

quit;

proc sql;

  delete from acct_dtl a

  where a.tlnbr in ( select t.tlnbr from temp t)

;

Scott_Mitchell
Quartz | Level 8

DATA ACCT_DTL ;

INFILE DATALINES;

INPUT TLNBR;

DATALINES;

1

2

3

4

5

6

7

8

9

;

RUN;

DATA CONS3;

INFILE DATALINES DELIMITER="," TRUNCOVER;

INPUT HOME BUSI ALT;

DATALINES;

1,1,1

1,1,

3,3,

5, ,5

7,,

9,9,9

9,9,9

15,12,37

10,10,7

11,9,11

12,12,12

20,9,30

20,20,20

;

RUN;

/*CREATE MACRO VARIABLE NUMVARS CONTAINNIG ALL NUMERIC VARIABLES FOR THE SPECIFYING THE ARRAY*/

PROC SQL NOPRINT;

  SELECT NAME INTO : NUMVARS SEPARATED BY " "

  FROM DICTIONARY.COLUMNS

  WHERE LIBNAME = "WORK" AND MEMNAME = "CONS3" AND UPCASE(TYPE) = "NUM";

RUN;

/*IF YOU WANT A LIST OF OBSERVATIONS THAT HAVE AT LEAST ONE VALUE THAT DOESN'T APPEAR IN ACCT_DTL*/

DATA WANT1;

  IF _N_ = 1 THEN DO;

  DECLARE HASH H(DATASET:"ACCT_DTL");

  H.DEFINEKEY("TLNBR");

  H.DEFINEDATA("TLNBR");

  H.DEFINEDONE();

  END;

  ARRAY VARS {*} &NUMVARS.;

  DO UNTIL (EOF);

  SET CONS3 END=EOF;

  DO I = LBOUND(VARS) TO HBOUND(VARS);

  TLNBR = VARS{I};

  IF H.FIND() ~= 0 THEN DO;

  RETURN;

  END;

  END;

  END;

  RC=H.CLEAR();

  DROP I RC TLNBR;

RUN;

/*IF YOU WANT A LIST OF OBSERVATIONS THAT HAVE NO VALUES CONTAINED IN  ACCT_DTL*/

DATA WANT2;

  IF _N_ = 1 THEN DO;

  DECLARE HASH H(DATASET:"ACCT_DTL");

  H.DEFINEKEY("TLNBR");

  H.DEFINEDATA("TLNBR");

  H.DEFINEDONE();

  END;

  ARRAY VARS {*} &NUMVARS.;

  DO UNTIL (EOF);

  SET CONS3 END = EOF;

  DO I = 1 TO DIM(VARS);

  TLNBR = VARS{I};

  IF H.FIND() = 0 THEN DO;

  I = HBOUND(VARS);

  END;

  ELSE IF H.FIND() ~= 0  AND HBOUND(VARS) = I THEN DO;

  OUTPUT;

  END;

  END;

  END;

  RC=H.CLEAR();

  DROP I RC TLNBR;

RUN;

Vince28_Statcan
Quartz | Level 8

I'm not sure to fully understand your requirement between the statement below and your current code orientation:

I would like to join table A with Table B on three different columns and any matching records from Table B should be deleted in table A. In the below code I've written the records in data set NEW should be removed from Cons3. how can i do it?

Is your desired result a new table with all the records in A that did not exist in B?

Is your desired result a new table with the joined records AND an update to the table A to remove records that were in B?

Because conceptually if you join A and B and then remove all records that were in B, you don't need a join to achieve this result since all of the "outer" portion of the join will have no values in the columns provided by table B. Anyway, if your objective is simply to reduce table A by removing all records such that a given key (acct_dtl) were in neither of the 3 varibles (home, busi, alt) on table B, it can be achieved without a join as follow:

data updated_A;

     length home $20. busi $20. alt $20.; /* You will need to adjust these lengths to be representative of the 3 variables in table B */

     if _n_=1 then do;

     declare hash homehash(dataset: 'cons3');

     homehash.defineKey('home');

     homehash.defineDone();

     declare hash busihash(dataset: 'cons3');

     homehash.defineKey('busi');

     homehash.defineDone();

     declare hash althash(dataset: 'cons3');

     homehash.defineKey('alt');

     homehash.defineDone();

     end;

     set acct_dtl;

     if homehash.find(key: tlnbr) = 0 or busihash.find(key: tlnbr) = 0 or althash.find(key: tlnbr) = 0 then ; /* If any return code from find method is 0, then tlnbr was found in at least one of the lookup tables and                                                                       we want it not to be output thus the empty then statement */

     else output;

run;

If this is not your desired output and that scott's solution does not help either, you should consider posting sample data and your desired final output(s).

Vince

DBailey
Lapis Lazuli | Level 10

If your goal is to ultimately delete records from A where matching records exist in B (based on one of three matches)..

proc sql;

delete from acct_dtl t1

where exists (select * from cons3 where home=t1.tlnbr or busi=t1.tlnbr or alt=tlnbr);

quit;

helloSAS
Obsidian | Level 7

This looks right, but, why does it take longer time to process? It takes 10 times more CPU time than the multiple proc sql inner join and delete.

Can you please shed some light on that?

Vince28_Statcan
Quartz | Level 8

If you are talking about DBailey's solution, it is because for each row of acct_dtl, the full subquery (select * from cons3 where home=t1.tlnbr or busi=t1.tlnbr or alt=t1.tlnbr) is recomputed so you are more or less doing a full query on table B for each row in table A.

If your desired output was what he provided, my hash table solution should achieve the desired results in significantly less time. It's less pretty because of the long/dense hash object syntax but its of great use for efficiency. The only issue is that since all hash objects are in memory, if table B has millions of records, you might face an insufficient memory error.

Vince

helloSAS
Obsidian | Level 7


Thank you! I will definitely try your code too.

I really thank everyone for your valuable inputs.

DBailey
Lapis Lazuli | Level 10

Vince28 is right.  Hashing is fastest..but..the larger the reference table (cons3 in your case), the larger the memory requirements are to hold the hash tables.

If you're deleting from sas datasets, I find it sometimes faster to create an entirely new dataset.  This also eliminates the problem with the deleted records still taking up disk space.  You might can speed up processing by indexing cons3:

proc sql;

create index home on cons3.home;

create index busi on cons3.busi;

create index alt on cons3.alt;

create table want as

select t1.*

from

     acct_dtl t1

     left outer join cons3 t2

          on t1.tlnbr=t2.home

    left outer join cons3 t3

          on t1.tlnbr=t3.busi

     left outer join cons3 t4

          on t1.tlnbr=t4.alt

where

     t2.home is null

     and t3.busi is null

     and t4.alt is null;

quit;

helloSAS
Obsidian | Level 7

Thanks for the reply

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 6669 views
  • 1 like
  • 6 in conversation