Hi all,
When running a data step multithreaded in CAS using a hash table: How many times does the hash table get loaded into memory?
Once per thread? Or once per worker or even only once at all with the controller doing some smarts?
My current understanding is that it would need to be once per thread but if that's not the case then I'd be happy to be wrong - and eager to learn how things are working.
Here some self-contained working code I used for my testing where I can see that the data step with the hash lookup runs multithreaded on multiple workers (in my environment) and returns the expected result.
options msglevel=i;
cas mysess cassessopts=(caslib="casuser");
libname casuser cas;
data casuser.class(copies=0);
set sashelp.class end=last;
do i=1 to 300;
output;
end;
if last then output;
drop i;
run;
data casuser.base_table(copies=0 replace=yes);
length row_id_1 $20;
row_id_1=catx('_',_threadid_,_n_);
threadid_1=_threadid_;
hostname_1=_hostname_;
set casuser.class;
run;
data casuser.lookup_table(duplicate=yes replace=yes);
set sashelp.class;
if name in ('Alfred','Judy','William');
run;
data casuser.result(replace=yes);
if _n_=1 then
do;
dcl hash h1(dataset:'casuser.lookup_table');
h1.defineKey('name');
h1.defineDone();
end;
length row_id_2 $20 threadid_2 8 hostname_2 $20;
set casuser.base_table;
if h1.check() =0;
row_id_2=catx('_',_threadid_,_n_);
threadid_2=_threadid_;
hostname_2=_hostname_;
run;
proc freq data=casuser.result;
table threadid_2*hostname_2 /nocol norow nocum nopercent;
table threadid_2*name /nocol norow nocum nopercent;
table hostname_2*name/nocol norow nocum nopercent;
table name/nocol norow nocum nopercent;
run;
/* proc print data=casuser.result; */
/* run; */
cas mysess terminate;
And here the first freq from above code that shows me that the data step with the hash lookup runs in my environment on 4 workers with 3 threads per worker.
@DerylHollick , @hashman
... View more