I never really trusted DBKEY as it seldom delivered on its promise. Don't forget it generates one query per lookup value.
Here is a (not so quick it turns out) benchmark.
Done on our Teradata server; As usual, only valid for our data on our platform etc....
4 methods used,
- DBKEY is the worst (2 hours!!, and there is an index),
- macro list or MULTI_DATASRC_OPT are the fastest,
- full download in between
I am *not* allowed to upload, so can't test this.
data LOOKUP; do I=1 to 2e3; ACCOUNT_ID=put(I,z10.); output; end ; run;
%****** Macro list ;
proc sql noprint;
select unique quote(trim(ACCOUNT_ID)) into :values separated by ',' from LOOKUP;
create table T1 as
select ACCOUNT_ID
from TERALIB.TAB b
where ACCOUNT_ID in(&values.)
and b.MONTH_KEY = '201712';
quit;
/*
SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB" WHERE ( ( ("ACCOUNT_ID" IN ( '0000000001' , '0000000002' ,
... 2000 values .....
'0000001995' , '0000001996' , '0000001997' , '0000001998' , '0000001999' , '0000002000' ) ) ) AND ("MONTH_KEY" = '201712' ) )
NOTE: PROCEDURE SQL used (Total process time):
real time 24.00 seconds
*/
%****** Full download;
proc sql;
create table T2 as select b.ACCOUNT_ID
from LOOKUP a, TERALIB.TAB b
where a.ACCOUNT_ID = b.ACCOUNT_ID
and b.MONTH_KEY = '201712';
quit;
/*
SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB" WHERE ("MONTH_KEY" = '201712' )
NOTE: PROCEDURE SQL used (Total process time):
real time 2:57.98
*/
%****** DBKEY= option;
proc sql;
create table T3 as select b.ACCOUNT_ID
from LOOKUP a, TERALIB.TAB(dbkey=ACCOUNT_ID) b
where a.ACCOUNT_ID = b.ACCOUNT_ID
and b.MONTH_KEY = '201712'; *2s;
quit;
/*
2000 queries: USING ("ACCOUNT_ID" VARCHAR (22))SELECT "ACCOUNT_ID","MONTH_KEY" FROM TERA."TAB" WHERE "ACCOUNT_ID"=:"ACCOUNT_ID"
NOTE: PROCEDURE SQL used (Total process time):
real time 02:03:47.07
*/
%****** MULTI_DATASRC_OPT= option;
proc sql;
create view _V as
select b.ACCOUNT_ID
from LOOKUP a, DBLIB.TAB b
where a.ACCOUNT_ID = b.ACCOUNT_ID
and b.MONTH_KEY = '201712'
using libname DBLIB teradata user=xxx .... multi_datasrc_opt=in_clause;
create table T4 as select * from _V;
quit;
/*
NOTE: PROCEDURE SQL used (Total process time):
real time 19.67 seconds
*/
Yes there is a primary index on ACCOUNT_ID:
proc sql;
connect using TERALIB;
select * from connection to TERALIB (
select count(*)
from dbc.indices
where tablename = 'TAB'
and columnname in('MONTH_KEY', 'ACCOUNT_ID')
and IndexType in('P', 'Q')
);
returns 1.
It's also worth noting that the full download is automatically optimised by SAS to a query containing an in() clause if the LOOKUP table is smaller.
... View more