Re: join on 3 columns

helloSAS · Posted 08-13-2013 06:04 PM

hi,

I would like to join table A with Table B on three different columns and any matching records from Table B should be deleted in table A. In the below code I've written the records in data set NEW should be removed from Cons3. how can i do it?

proc sql;
create table home as
select a.tlnbr, b.home, b.busi, b.alt, b.ACCT_ID
from acct_dtl as a
inner join cons3 as b on a.tlnbr = b.home;
quit;

proc sql;
create table busi as
select a.tlnbr, b.home, b.busi, b.alt, b.ACCT_ID
from acct_dtl as a
inner join cons3 as b on a.tlnbr = b.busi;
quit;

proc sql;
create table alt as
select a.tlnbr, b.home, b.busi, b.alt, b.ACCT_ID
from acct_dtl as a
inner join cons3 as b on a.tlnbr = b.alt;
quit;

data new;
set home busi alt;
run;

Reeza · Posted 08-13-2013 06:18 PM

You have three different joins. Can you post sample input data and sample output data?

Fugue · Posted 08-13-2013 06:22 PM

I believe the basic syntax would be like this. Recursively referencing the target may create data integrity problems; hence, an intermediate table or view.

proc sql ;

create table temp as

select distinct t1.tlnbr

from acct_dtl t1,

cons3 t2

where t1.tlnbr = t2.home

or t1.tlnbr = t2.busi

or t1.tlnbr = t2.alt

;

quit;

proc sql;

delete from acct_dtl a

where a.tlnbr in ( select t.tlnbr from temp t)

;

Scott_Mitchell · Posted 08-14-2013 07:32 AM

DATA ACCT_DTL ;

INFILE DATALINES;

INPUT TLNBR;

DATALINES;

1

2

3

4

5

6

7

8

9

;

RUN;

DATA CONS3;

INFILE DATALINES DELIMITER="," TRUNCOVER;

INPUT HOME BUSI ALT;

DATALINES;

1,1,1

1,1,

3,3,

5, ,5

7,,

9,9,9

15,12,37

10,10,7

11,9,11

12,12,12

20,9,30

20,20,20

;

RUN;

/*CREATE MACRO VARIABLE NUMVARS CONTAINNIG ALL NUMERIC VARIABLES FOR THE SPECIFYING THE ARRAY*/

PROC SQL NOPRINT;

SELECT NAME INTO : NUMVARS SEPARATED BY " "

FROM DICTIONARY.COLUMNS

WHERE LIBNAME = "WORK" AND MEMNAME = "CONS3" AND UPCASE(TYPE) = "NUM";

RUN;

/*IF YOU WANT A LIST OF OBSERVATIONS THAT HAVE AT LEAST ONE VALUE THAT DOESN'T APPEAR IN ACCT_DTL*/

DATA WANT1;

IF _N_ = 1 THEN DO;

DECLARE HASH H(DATASET:"ACCT_DTL");

H.DEFINEKEY("TLNBR");

H.DEFINEDATA("TLNBR");

H.DEFINEDONE();

END;

ARRAY VARS {*} &NUMVARS.;

DO UNTIL (EOF);

SET CONS3 END=EOF;

DO I = LBOUND(VARS) TO HBOUND(VARS);

TLNBR = VARS{I};

IF H.FIND() ~= 0 THEN DO;

RETURN;

END;

RC=H.CLEAR();

DROP I RC TLNBR;

RUN;

/*IF YOU WANT A LIST OF OBSERVATIONS THAT HAVE NO VALUES CONTAINED IN ACCT_DTL*/

DATA WANT2;

IF _N_ = 1 THEN DO;

DECLARE HASH H(DATASET:"ACCT_DTL");

H.DEFINEKEY("TLNBR");

H.DEFINEDATA("TLNBR");

H.DEFINEDONE();

END;

ARRAY VARS {*} &NUMVARS.;

DO UNTIL (EOF);

SET CONS3 END = EOF;

DO I = 1 TO DIM(VARS);

TLNBR = VARS{I};

IF H.FIND() = 0 THEN DO;

I = HBOUND(VARS);

END;

ELSE IF H.FIND() ~= 0 AND HBOUND(VARS) = I THEN DO;

OUTPUT;

END;

RC=H.CLEAR();

DROP I RC TLNBR;

RUN;

Vince28_Statcan · Posted 08-14-2013 08:25 AM

I'm not sure to fully understand your requirement between the statement below and your current code orientation:

I would like to join table A with Table B on three different columns and any matching records from Table B should be deleted in table A. In the below code I've written the records in data set NEW should be removed from Cons3. how can i do it?

Is your desired result a new table with all the records in A that did not exist in B?

Is your desired result a new table with the joined records AND an update to the table A to remove records that were in B?

Because conceptually if you join A and B and then remove all records that were in B, you don't need a join to achieve this result since all of the "outer" portion of the join will have no values in the columns provided by table B. Anyway, if your objective is simply to reduce table A by removing all records such that a given key (acct_dtl) were in neither of the 3 varibles (home, busi, alt) on table B, it can be achieved without a join as follow:

data updated_A;

length home $20. busi $20. alt $20.; /* You will need to adjust these lengths to be representative of the 3 variables in table B */

if _n_=1 then do;

declare hash homehash(dataset: 'cons3');

homehash.defineKey('home');

homehash.defineDone();

declare hash busihash(dataset: 'cons3');

homehash.defineKey('busi');

homehash.defineDone();

declare hash althash(dataset: 'cons3');

homehash.defineKey('alt');

homehash.defineDone();

end;

set acct_dtl;

if homehash.find(key: tlnbr) = 0 or busihash.find(key: tlnbr) = 0 or althash.find(key: tlnbr) = 0 then ; /* If any return code from find method is 0, then tlnbr was found in at least one of the lookup tables and we want it not to be output thus the empty then statement */

else output;

run;

If this is not your desired output and that scott's solution does not help either, you should consider posting sample data and your desired final output(s).

Vince

DBailey · Posted 08-14-2013 08:47 AM

If your goal is to ultimately delete records from A where matching records exist in B (based on one of three matches)..

proc sql;

delete from acct_dtl t1

where exists (select * from cons3 where home=t1.tlnbr or busi=t1.tlnbr or alt=tlnbr);

quit;

helloSAS · Posted 08-14-2013 09:26 AM

This looks right, but, why does it take longer time to process? It takes 10 times more CPU time than the multiple proc sql inner join and delete.

Can you please shed some light on that?

Vince28_Statcan · Posted 08-14-2013 09:34 AM

If you are talking about DBailey's solution, it is because for each row of acct_dtl, the full subquery (select * from cons3 where home=t1.tlnbr or busi=t1.tlnbr or alt=t1.tlnbr) is recomputed so you are more or less doing a full query on table B for each row in table A.

If your desired output was what he provided, my hash table solution should achieve the desired results in significantly less time. It's less pretty because of the long/dense hash object syntax but its of great use for efficiency. The only issue is that since all hash objects are in memory, if table B has millions of records, you might face an insufficient memory error.

Vince

helloSAS · Posted 08-14-2013 09:41 AM

Thank you! I will definitely try your code too.

I really thank everyone for your valuable inputs.

DBailey · Posted 08-14-2013 09:42 AM

Vince28 is right. Hashing is fastest..but..the larger the reference table (cons3 in your case), the larger the memory requirements are to hold the hash tables.

If you're deleting from sas datasets, I find it sometimes faster to create an entirely new dataset. This also eliminates the problem with the deleted records still taking up disk space. You might can speed up processing by indexing cons3:

proc sql;

create index home on cons3.home;

create index busi on cons3.busi;

create index alt on cons3.alt;

create table want as

select t1.*

from

acct_dtl t1

left outer join cons3 t2

on t1.tlnbr=t2.home

left outer join cons3 t3

on t1.tlnbr=t3.busi

left outer join cons3 t4

on t1.tlnbr=t4.alt

where

t2.home is null

and t3.busi is null

and t4.alt is null;

quit;

helloSAS · Posted 08-14-2013 09:24 AM

Thanks for the reply

SAS Training: Just a Click Away