BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Patrick
Opal | Level 21

Hi all,

When running a data step multithreaded in CAS using a hash table: How many times does the hash table get loaded into memory?

Once per thread? Or once per worker or even only once at all with the controller doing some smarts?

 

My current understanding is that it would need to be once per thread but if that's not the case then I'd be happy to be wrong - and eager to learn how things are working.

 

Here some self-contained working code I used for my testing where I can see that the data step with the hash lookup runs multithreaded on multiple workers (in my environment) and returns the expected result.

Spoiler
options msglevel=i;
cas mysess cassessopts=(caslib="casuser");

libname casuser cas;

data casuser.class(copies=0);
  set sashelp.class end=last;
  do i=1 to 300;
    output;
  end;
  if last then output;
  drop i;
run;

data casuser.base_table(copies=0 replace=yes);
  length row_id_1 $20;
  row_id_1=catx('_',_threadid_,_n_);
  threadid_1=_threadid_;
  hostname_1=_hostname_;
  set casuser.class;
run;

data casuser.lookup_table(duplicate=yes replace=yes);
  set sashelp.class;
  if name in ('Alfred','Judy','William');
run;

data casuser.result(replace=yes);
  if _n_=1 then
    do;
      dcl hash h1(dataset:'casuser.lookup_table');
      h1.defineKey('name');
      h1.defineDone();
    end;
  length row_id_2 $20 threadid_2 8 hostname_2 $20;
  set casuser.base_table;
  if h1.check() =0;
  row_id_2=catx('_',_threadid_,_n_);
  threadid_2=_threadid_;
  hostname_2=_hostname_;
run;

proc freq data=casuser.result;
  table threadid_2*hostname_2 /nocol norow nocum nopercent;
  table threadid_2*name /nocol norow nocum nopercent;
  table hostname_2*name/nocol norow nocum nopercent;
  table name/nocol norow nocum nopercent;
run;

/* proc print data=casuser.result; */
/* run; */

cas mysess terminate;

And here the first freq from above code that shows me that the data step with the hash lookup runs in my environment on 4 workers with 3 threads per worker.

Patrick_0-1701082837201.png

@DerylHollick , @hashman 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

I've received an answer via another channel so just adding here this info as well:

  • If the hash is "read only" then a single instance of the hash table will get created for the whole process (even if multiple worker nodes it's still a single instance).
  • If the hash is "read/write" then there will be a copy per thread - which can be a lot. In the environment I'm currently "playing" processes run with up to 192 threads.

If a hash is read or read/write is decided during compilation time. Any hash method in the code that can modify the data - like add() - will lead to an instance per thread.

 

 

View solution in original post

1 REPLY 1
Patrick
Opal | Level 21

I've received an answer via another channel so just adding here this info as well:

  • If the hash is "read only" then a single instance of the hash table will get created for the whole process (even if multiple worker nodes it's still a single instance).
  • If the hash is "read/write" then there will be a copy per thread - which can be a lot. In the environment I'm currently "playing" processes run with up to 192 threads.

If a hash is read or read/write is decided during compilation time. Any hash method in the code that can modify the data - like add() - will lead to an instance per thread.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Discussion stats
  • 1 reply
  • 466 views
  • 0 likes
  • 1 in conversation