Hi,
My below query is taking around 40 minutes to run for 1million records in All_account. my subacc1 has 80 records, subacc2 has 47 records, subacc3 has 68 records and subacc4 has 75 records. I want to reduce the processing time. Any help would be much appreciated. Thanks for your help.
proc sql;
create table account_new as
select a.*,
(case when exists (select 1 from subacc1 b where a.AccountNo = b.account_no)
then 1 else 0
end) as subacc1,
(case when exists (select 1 from subacc2 c where a.AccountNo = c.account_no)
then 1 else 0
end) as subacc2,
(case when exists (select 1 from subacc3 d where a.AccountNo = d.account_no)
then 1 else 0
end) as subacc3,
(case when exists (select 1 from subacc4 e where a.AccountNo = e.account_no)
then 1 else 0
end) as subacc4
from All_account a;
quit;
this is good case for using hash technique. I made data using Sashelp.class so that you can try out. here we are making flags based on name present in particular dataset
data class_female;
set sashelp.class(where=(sex='F'));
run;
data class_male;
set sashelp.class(where=(sex='M'));
run;
data want;
if _n_=1 then do;
if _n_ = 0 then set sashelp.class;
dcl hash h1(dataset: 'class_female');
h1.definekey("Name");
h1.definedone();
dcl hash h2(dataset: 'class_male');
h2.definekey("Name");
h2.definedone();
end;
set sashelp.class;
call missing(female_flag);
if h1.find() = 0 then female_flag =1;
else female_flag = 0 ;
if h2.find() = 0 then male_flag =1;
else male_flag = 0 ;
run;
this is good case for using hash technique. I made data using Sashelp.class so that you can try out. here we are making flags based on name present in particular dataset
data class_female;
set sashelp.class(where=(sex='F'));
run;
data class_male;
set sashelp.class(where=(sex='M'));
run;
data want;
if _n_=1 then do;
if _n_ = 0 then set sashelp.class;
dcl hash h1(dataset: 'class_female');
h1.definekey("Name");
h1.definedone();
dcl hash h2(dataset: 'class_male');
h2.definekey("Name");
h2.definedone();
end;
set sashelp.class;
call missing(female_flag);
if h1.find() = 0 then female_flag =1;
else female_flag = 0 ;
if h2.find() = 0 then male_flag =1;
else male_flag = 0 ;
run;
Thanks kiranv. I think I can't use the hash table in one dataset like your code. Because I'm joining with four different datasets.
I have used the hash table logic but calling the macro 4 times as below. Please let me know if there is any better solution.
Thanks in advance!
%macro hash_macro (hash_table, in_table, out_table, keyvar, renamevar);
data &out_table. (drop=&renamevar.);
set &in_table.;
if _n_ = 0 then set &hash_table.;
if _n_ = 1 then do;
declare hash ht (dataset: "&hash_table. (rename = (&renamevar. = &keyvar. ))",
ordered: "A");
ht.definekey ("&keyvar.");
ht.definedata ("&keyvar.");
ht.definedone ( );
end;
if ht.find() = 0 then do; &hash_table. = 1; output; end;
else do; &hash_table. = 0; output; end;
run;
%mend;
/* call hash_macro for account table*/
%hash_macro(subacc1, all_account, account1, AccountNo, account_no);
%hash_macro(subacc2, account1, account2, AccountNo, account_no);
%hash_macro(subacc3, account2, account3, AccountNo, account_no);
%hash_macro(subacc4, account3, account_new, AccountNo, account_no);
if you see, I am using 2 datasets(lookup/small datasets) and you can use 4. you are creating four tables in your macro code, which is unlike your first table/intial code.
Thanks for correcting me. The below code is working fine.
data account_new;
if _n_=1 then do;
if _n_ = 0 then set All_account;
dcl hash h1(dataset: "subacc1 (rename = (account_no = AccountNo )");
h1.definekey("AccountNo");
h1.definedone();
dcl hash h2(dataset: "subacc2 (rename = (account_no = AccountNo )");
h2.definekey("AccountNo");
h2.definedone();
dcl hash h3(dataset: "subacc3 (rename = (account_no = AccountNo )");
h3.definekey("AccountNo");
h3.definedone();
dcl hash h4(dataset: "subacc4 (rename = (account_no = AccountNo )");
h4.definekey("AccountNo");
h4.definedone();
end;
set All_account;
if h1.find() = 0 then subacc1 =1;
else subacc1 = 0 ;
if h2.find() = 0 then subacc2 =1;
else subacc2 = 0 ;
if h3.find() = 0 then subacc3 =1;
else subacc3 = 0 ;
if h4.find() = 0 then subacc4 =1;
else subacc4 = 0 ;
run;
Just curious to know, how much time the hash solution took
My old query took 47 minutes and hash table took less than a minute (3.16 seconds).
I don't think you need 2 hash tables nor is hash the best solution?I'm not certain but can't confirm until I pplay around
Would it be possible to provide some representative samples and the required output for us to work ?
Let me admit, I'm too lazy to assume, simulate to be totally honest.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.