Hi,
My below query is taking around 40 minutes to run for 1million records in All_account. my subacc1 has 80 records, subacc2 has 47 records, subacc3 has 68 records and subacc4 has 75 records. I want to reduce the processing time. Any help would be much appreciated. Thanks for your help.
proc sql;
create table account_new as
select a.*,
(case when exists (select 1 from subacc1 b where a.AccountNo = b.account_no)
then 1 else 0
end) as subacc1,
(case when exists (select 1 from subacc2 c where a.AccountNo = c.account_no)
then 1 else 0
end) as subacc2,
(case when exists (select 1 from subacc3 d where a.AccountNo = d.account_no)
then 1 else 0
end) as subacc3,
(case when exists (select 1 from subacc4 e where a.AccountNo = e.account_no)
then 1 else 0
end) as subacc4
from All_account a;
quit;
this is good case for using hash technique. I made data using Sashelp.class so that you can try out. here we are making flags based on name present in particular dataset
data class_female;
set sashelp.class(where=(sex='F'));
run;
data class_male;
set sashelp.class(where=(sex='M'));
run;
data want;
if _n_=1 then do;
if _n_ = 0 then set sashelp.class;
dcl hash h1(dataset: 'class_female');
h1.definekey("Name");
h1.definedone();
dcl hash h2(dataset: 'class_male');
h2.definekey("Name");
h2.definedone();
end;
set sashelp.class;
call missing(female_flag);
if h1.find() = 0 then female_flag =1;
else female_flag = 0 ;
if h2.find() = 0 then male_flag =1;
else male_flag = 0 ;
run;
this is good case for using hash technique. I made data using Sashelp.class so that you can try out. here we are making flags based on name present in particular dataset
data class_female;
set sashelp.class(where=(sex='F'));
run;
data class_male;
set sashelp.class(where=(sex='M'));
run;
data want;
if _n_=1 then do;
if _n_ = 0 then set sashelp.class;
dcl hash h1(dataset: 'class_female');
h1.definekey("Name");
h1.definedone();
dcl hash h2(dataset: 'class_male');
h2.definekey("Name");
h2.definedone();
end;
set sashelp.class;
call missing(female_flag);
if h1.find() = 0 then female_flag =1;
else female_flag = 0 ;
if h2.find() = 0 then male_flag =1;
else male_flag = 0 ;
run;
Thanks kiranv. I think I can't use the hash table in one dataset like your code. Because I'm joining with four different datasets.
I have used the hash table logic but calling the macro 4 times as below. Please let me know if there is any better solution.
Thanks in advance!
%macro hash_macro (hash_table, in_table, out_table, keyvar, renamevar);
data &out_table. (drop=&renamevar.);
set &in_table.;
if _n_ = 0 then set &hash_table.;
if _n_ = 1 then do;
declare hash ht (dataset: "&hash_table. (rename = (&renamevar. = &keyvar. ))",
ordered: "A");
ht.definekey ("&keyvar.");
ht.definedata ("&keyvar.");
ht.definedone ( );
end;
if ht.find() = 0 then do; &hash_table. = 1; output; end;
else do; &hash_table. = 0; output; end;
run;
%mend;
/* call hash_macro for account table*/
%hash_macro(subacc1, all_account, account1, AccountNo, account_no);
%hash_macro(subacc2, account1, account2, AccountNo, account_no);
%hash_macro(subacc3, account2, account3, AccountNo, account_no);
%hash_macro(subacc4, account3, account_new, AccountNo, account_no);
if you see, I am using 2 datasets(lookup/small datasets) and you can use 4. you are creating four tables in your macro code, which is unlike your first table/intial code.
Thanks for correcting me. The below code is working fine.
data account_new;
if _n_=1 then do;
if _n_ = 0 then set All_account;
dcl hash h1(dataset: "subacc1 (rename = (account_no = AccountNo )");
h1.definekey("AccountNo");
h1.definedone();
dcl hash h2(dataset: "subacc2 (rename = (account_no = AccountNo )");
h2.definekey("AccountNo");
h2.definedone();
dcl hash h3(dataset: "subacc3 (rename = (account_no = AccountNo )");
h3.definekey("AccountNo");
h3.definedone();
dcl hash h4(dataset: "subacc4 (rename = (account_no = AccountNo )");
h4.definekey("AccountNo");
h4.definedone();
end;
set All_account;
if h1.find() = 0 then subacc1 =1;
else subacc1 = 0 ;
if h2.find() = 0 then subacc2 =1;
else subacc2 = 0 ;
if h3.find() = 0 then subacc3 =1;
else subacc3 = 0 ;
if h4.find() = 0 then subacc4 =1;
else subacc4 = 0 ;
run;
Just curious to know, how much time the hash solution took
My old query took 47 minutes and hash table took less than a minute (3.16 seconds).
I don't think you need 2 hash tables nor is hash the best solution?I'm not certain but can't confirm until I pplay around
Would it be possible to provide some representative samples and the required output for us to work ?
Let me admit, I'm too lazy to assume, simulate to be totally honest.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.