hi,
I would like to join table A with Table B on three different columns and any matching records from Table B should be deleted in table A. In the below code I've written the records in data set NEW should be removed from Cons3. how can i do it?
proc sql;
create table home as
select a.tlnbr, b.home, b.busi, b.alt, b.ACCT_ID
from acct_dtl as a
inner join cons3 as b on a.tlnbr = b.home;
quit;
proc sql;
create table busi as
select a.tlnbr, b.home, b.busi, b.alt, b.ACCT_ID
from acct_dtl as a
inner join cons3 as b on a.tlnbr = b.busi;
quit;
proc sql;
create table alt as
select a.tlnbr, b.home, b.busi, b.alt, b.ACCT_ID
from acct_dtl as a
inner join cons3 as b on a.tlnbr = b.alt;
quit;
data new;
set home busi alt;
run;
You have three different joins. Can you post sample input data and sample output data?
I believe the basic syntax would be like this. Recursively referencing the target may create data integrity problems; hence, an intermediate table or view.
proc sql ;
create table temp as
select distinct t1.tlnbr
from acct_dtl t1,
cons3 t2
where t1.tlnbr = t2.home
or t1.tlnbr = t2.busi
or t1.tlnbr = t2.alt
;
quit;
proc sql;
delete from acct_dtl a
where a.tlnbr in ( select t.tlnbr from temp t)
;
DATA ACCT_DTL ;
INFILE DATALINES;
INPUT TLNBR;
DATALINES;
1
2
3
4
5
6
7
8
9
;
RUN;
DATA CONS3;
INFILE DATALINES DELIMITER="," TRUNCOVER;
INPUT HOME BUSI ALT;
DATALINES;
1,1,1
1,1,
3,3,
5, ,5
7,,
9,9,9
9,9,9
15,12,37
10,10,7
11,9,11
12,12,12
20,9,30
20,20,20
;
RUN;
/*CREATE MACRO VARIABLE NUMVARS CONTAINNIG ALL NUMERIC VARIABLES FOR THE SPECIFYING THE ARRAY*/
PROC SQL NOPRINT;
SELECT NAME INTO : NUMVARS SEPARATED BY " "
FROM DICTIONARY.COLUMNS
WHERE LIBNAME = "WORK" AND MEMNAME = "CONS3" AND UPCASE(TYPE) = "NUM";
RUN;
/*IF YOU WANT A LIST OF OBSERVATIONS THAT HAVE AT LEAST ONE VALUE THAT DOESN'T APPEAR IN ACCT_DTL*/
DATA WANT1;
IF _N_ = 1 THEN DO;
DECLARE HASH H(DATASET:"ACCT_DTL");
H.DEFINEKEY("TLNBR");
H.DEFINEDATA("TLNBR");
H.DEFINEDONE();
END;
ARRAY VARS {*} &NUMVARS.;
DO UNTIL (EOF);
SET CONS3 END=EOF;
DO I = LBOUND(VARS) TO HBOUND(VARS);
TLNBR = VARS{I};
IF H.FIND() ~= 0 THEN DO;
RETURN;
END;
END;
END;
RC=H.CLEAR();
DROP I RC TLNBR;
RUN;
/*IF YOU WANT A LIST OF OBSERVATIONS THAT HAVE NO VALUES CONTAINED IN ACCT_DTL*/
DATA WANT2;
IF _N_ = 1 THEN DO;
DECLARE HASH H(DATASET:"ACCT_DTL");
H.DEFINEKEY("TLNBR");
H.DEFINEDATA("TLNBR");
H.DEFINEDONE();
END;
ARRAY VARS {*} &NUMVARS.;
DO UNTIL (EOF);
SET CONS3 END = EOF;
DO I = 1 TO DIM(VARS);
TLNBR = VARS{I};
IF H.FIND() = 0 THEN DO;
I = HBOUND(VARS);
END;
ELSE IF H.FIND() ~= 0 AND HBOUND(VARS) = I THEN DO;
OUTPUT;
END;
END;
END;
RC=H.CLEAR();
DROP I RC TLNBR;
RUN;
I'm not sure to fully understand your requirement between the statement below and your current code orientation:
I would like to join table A with Table B on three different columns and any matching records from Table B should be deleted in table A. In the below code I've written the records in data set NEW should be removed from Cons3. how can i do it?
Is your desired result a new table with all the records in A that did not exist in B?
Is your desired result a new table with the joined records AND an update to the table A to remove records that were in B?
Because conceptually if you join A and B and then remove all records that were in B, you don't need a join to achieve this result since all of the "outer" portion of the join will have no values in the columns provided by table B. Anyway, if your objective is simply to reduce table A by removing all records such that a given key (acct_dtl) were in neither of the 3 varibles (home, busi, alt) on table B, it can be achieved without a join as follow:
data updated_A;
length home $20. busi $20. alt $20.; /* You will need to adjust these lengths to be representative of the 3 variables in table B */
if _n_=1 then do;
declare hash homehash(dataset: 'cons3');
homehash.defineKey('home');
homehash.defineDone();
declare hash busihash(dataset: 'cons3');
homehash.defineKey('busi');
homehash.defineDone();
declare hash althash(dataset: 'cons3');
homehash.defineKey('alt');
homehash.defineDone();
end;
set acct_dtl;
if homehash.find(key: tlnbr) = 0 or busihash.find(key: tlnbr) = 0 or althash.find(key: tlnbr) = 0 then ; /* If any return code from find method is 0, then tlnbr was found in at least one of the lookup tables and we want it not to be output thus the empty then statement */
else output;
run;
If this is not your desired output and that scott's solution does not help either, you should consider posting sample data and your desired final output(s).
Vince
If your goal is to ultimately delete records from A where matching records exist in B (based on one of three matches)..
proc sql;
delete from acct_dtl t1
where exists (select * from cons3 where home=t1.tlnbr or busi=t1.tlnbr or alt=tlnbr);
quit;
This looks right, but, why does it take longer time to process? It takes 10 times more CPU time than the multiple proc sql inner join and delete.
Can you please shed some light on that?
If you are talking about DBailey's solution, it is because for each row of acct_dtl, the full subquery (select * from cons3 where home=t1.tlnbr or busi=t1.tlnbr or alt=t1.tlnbr) is recomputed so you are more or less doing a full query on table B for each row in table A.
If your desired output was what he provided, my hash table solution should achieve the desired results in significantly less time. It's less pretty because of the long/dense hash object syntax but its of great use for efficiency. The only issue is that since all hash objects are in memory, if table B has millions of records, you might face an insufficient memory error.
Vince
Thank you! I will definitely try your code too.
I really thank everyone for your valuable inputs.
Vince28 is right. Hashing is fastest..but..the larger the reference table (cons3 in your case), the larger the memory requirements are to hold the hash tables.
If you're deleting from sas datasets, I find it sometimes faster to create an entirely new dataset. This also eliminates the problem with the deleted records still taking up disk space. You might can speed up processing by indexing cons3:
proc sql;
create index home on cons3.home;
create index busi on cons3.busi;
create index alt on cons3.alt;
create table want as
select t1.*
from
acct_dtl t1
left outer join cons3 t2
on t1.tlnbr=t2.home
left outer join cons3 t3
on t1.tlnbr=t3.busi
left outer join cons3 t4
on t1.tlnbr=t4.alt
where
t2.home is null
and t3.busi is null
and t4.alt is null;
quit;
Thanks for the reply
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.