I am running a THREAD method within the RUN method of a DS2 DATA step. I have code that executes in the INIT method. The problem is that the threaded processes in the RUN method start before the the code in the INIT method has completed. Is this by design? It there a way to prevent it?
Here some example code that demonstrates the issue. It is vastly simplified for the real use case.
data intyears;
do LoopYear = 2001 to 2009;
output;
end;
run;
proc ds2;
thread MyThread / overwrite=yes;
method init();
end;
method run();
declare integer x y z;
declare varchar(5000) tempstr;
set intyears;
by LoopYear;
Do x = 1 to 100000;
if mod(x,10000) = 0 then put 'Thread: ' LoopYear ':' x;
end;
end;
method term();
end;
endthread;
data _null_;
dcl thread MyThread MyThread;
declare package myPackage myPackage();
declare integer x;
method init();
PUT 'Starting INIT';
Do x = 1 to 1000000;
if mod(x,100000) = 0 then put 'Package: ' x;
end;
end;
method run();
set from MyThread threads=5;
end;
method term();
PUT 'In Term';
end;
enddata;
run;
quit;
Here is the log I get. Notice that there are "Package: " lines well after the "Thread:" lines start.
Starting INIT Package: 100000 Package: 200000 Package: 300000 Package: 400000 Package: 500000 Package: 600000 Package: 700000 Thread: 2003 : 10000 Thread: 2003 : 20000 Thread: 2003 : 30000 Thread: 2003 : 40000 Thread: 2002 : 10000 Thread: 2004 : 10000 Thread: 2003 : 50000 Thread: 2001 : 10000 Thread: 2005 : 10000 Thread: 2003 : 60000 Thread: 2003 : 70000 Thread: 2003 : 80000 Thread: 2003 : 90000 Thread: 2002 : 20000 Thread: 2004 : 20000 Thread: 2003 : 100000 Thread: 2005 : 20000 Thread: 2001 : 20000 Thread: 2006 : 10000 Thread: 2006 : 20000 Thread: 2006 : 30000 Thread: 2006 : 40000 Thread: 2002 : 30000 Thread: 2004 : 30000 Thread: 2006 : 50000 Thread: 2005 : 30000 Thread: 2001 : 30000 Thread: 2006 : 60000 Thread: 2006 : 70000 Thread: 2006 : 80000 Thread: 2006 : 90000 Thread: 2002 : 40000 Thread: 2004 : 40000 Thread: 2005 : 40000 Thread: 2006 : 100000 Thread: 2001 : 40000 Thread: 2007 : 10000 Thread: 2004 : 50000 Thread: 2007 : 20000 Thread: 2004 : 60000 Thread: 2002 : 50000 Thread: 2007 : 30000 Thread: 2004 : 70000 Thread: 2005 : 50000 Thread: 2007 : 40000 Thread: 2001 : 50000 Thread: 2004 : 80000 Thread: 2007 : 50000 Thread: 2004 : 90000 Thread: 2002 : 60000 Thread: 2007 : 60000 Thread: 2004 : 100000 Thread: 2005 : 60000 Thread: 2007 : 70000 Thread: 2008 : 10000 Thread: 2001 : 60000 Thread: 2007 : 80000 Thread: 2002 : 70000 Thread: 2008 : 20000 Package: 800000 Thread: 2007 : 90000 Thread: 2005 : 70000 Thread: 2008 : 30000 Thread: 2007 : 100000 Thread: 2001 : 70000 Thread: 2008 : 40000 Thread: 2002 : 80000 Thread: 2009 : 10000 Thread: 2008 : 50000 Thread: 2005 : 80000 Thread: 2009 : 20000 Thread: 2008 : 60000 Thread: 2001 : 80000 Thread: 2009 : 30000 Thread: 2002 : 90000 Thread: 2008 : 70000 Thread: 2009 : 40000 Thread: 2005 : 90000 Thread: 2008 : 80000 Thread: 2009 : 50000 Thread: 2001 : 90000 4 The SAS System 17:46 Wednesday, April 14, 2021 Thread: 2008 : 90000 Thread: 2002 : 100000 Thread: 2009 : 60000 Thread: 2008 : 100000 Thread: 2005 : 100000 Thread: 2009 : 70000 Thread: 2009 : 80000 Thread: 2001 : 100000 Thread: 2009 : 90000 Thread: 2009 : 100000 Package: 900000 Package: 1000000 In Term
I did try this without using threads and it doesn't do this. It seems to me quite contrary to the purpose of the INIT method.
Any Ideas?
I'm not 100% sure of how this works behind the scenes, but - I suspect this could be an optimization.
In your example, it does not matter one bit to DS2 what order things run in - nothing in INIT affects the threads.
When I modify your code slightly, to add a parameter and a setparms, it works as expected - because now there is something related to the thread in INIT.
data intyears;
do LoopYear = 2001 to 2009;
output;
end;
run;
proc ds2;
thread MyThread (int a) / overwrite=yes;
method init();
end;
method run();
declare integer x y z;
declare varchar(5000) tempstr;
set intyears;
by LoopYear;
Do x = 1 to 100000;
if mod(x,10000) = 0 then put 'Thread: ' LoopYear ':' x ' a: ' a;
end;
end;
method term();
end;
endthread;
run;
data _null_;
dcl thread MyThread MyThread;
declare integer x;
declare integer a;
method init();
PUT 'Starting INIT';
Do x = 1 to 800000;
if mod(x,100000) = 0 then put 'Package: ' x;
end;
a = 100;
MyThread.setparms(a);
PUT 'Ending INIT';
end;
method run();
set from MyThread threads=5;
end;
method term();
PUT 'In Term';
end;
enddata;
run;
quit;
This produces the expected output - INIT ends before the MyThread begins.
I'm not 100% sure of how this works behind the scenes, but - I suspect this could be an optimization.
In your example, it does not matter one bit to DS2 what order things run in - nothing in INIT affects the threads.
When I modify your code slightly, to add a parameter and a setparms, it works as expected - because now there is something related to the thread in INIT.
data intyears;
do LoopYear = 2001 to 2009;
output;
end;
run;
proc ds2;
thread MyThread (int a) / overwrite=yes;
method init();
end;
method run();
declare integer x y z;
declare varchar(5000) tempstr;
set intyears;
by LoopYear;
Do x = 1 to 100000;
if mod(x,10000) = 0 then put 'Thread: ' LoopYear ':' x ' a: ' a;
end;
end;
method term();
end;
endthread;
run;
data _null_;
dcl thread MyThread MyThread;
declare integer x;
declare integer a;
method init();
PUT 'Starting INIT';
Do x = 1 to 800000;
if mod(x,100000) = 0 then put 'Package: ' x;
end;
a = 100;
MyThread.setparms(a);
PUT 'Ending INIT';
end;
method run();
set from MyThread threads=5;
end;
method term();
PUT 'In Term';
end;
enddata;
run;
quit;
This produces the expected output - INIT ends before the MyThread begins.
That is a great work around! It even makes some sense in why it works. Thanks!
I still think it is a very illogical way for things to work however 🙂
I'm posting this to add to the knowledge base. Tech support asked for the work around that Snoopy369 came up with. I gave them the slightly simplified version at the bottom of this post. Here is their explanation of why it works.
The 5 threads that run the DS2 thread block are started during startup before the DS2 data block’s INIT method is called by the 1 thread running the DS2 DATA block. If the DS2 thread block expects an argument value for a parameter, then the DS2 thread waits during its startup for the the DATA block thread to set the thread’s parameter values. Eventually, the data block’s INIT method sets the thread’s parameter values, and then the DS2 thread block threads get the parameter values and finish their startup. Note that if the DATA block does not call the SETPARMS method for whatever reason (maybe the SETPARMS call is inside an IF statement block that is bypassed), the SET FROM statement will set the thread’s parameter values to NULL or MISSING.
data intyears;
do LoopYear = 2001 to 2009;
output;
end;
run;
proc ds2;
thread MyThread (int ready) / overwrite=yes;
method init();
end;
method run();
declare integer x y z;
declare varchar(5000) tempstr;
set intyears;
by LoopYear;
Do x = 1 to 100000;
if mod(x,10000) = 0 then put 'Thread: ' LoopYear ':' x ;
end;
end;
method term();
end;
endthread;
run;
data _null_;
dcl thread MyThread MyThread;
declare integer x;
method init();
PUT 'Starting INIT';
Do x = 1 to 800000;
if mod(x,100000) = 0 then put 'Package: ' x;
end;
MyThread.setparms(1);
PUT 'Ending INIT';
end;
method run();
set from MyThread threads=5;
end;
method term();
PUT 'In Term';
end;
enddata;
run;
quit;
I heard back from tech support on this. It is designed behavior. The RUN method never waits for the INIT method to complete. It only appears this way because the statements in that method usually complete quickly. Here it the direct quote.
The 5 threads that run the DS2 thread block are started during startup of the DS2 program execution. The 5 threads running the DS2 thread block and the 1 thread running the DS2 data block all run concurrently after startup. The only synchronization between the threads running the DS2 thread block and the thread running DS2 data block is at the SET FROM statement. The DS2 data block thread will wait at the SET FROM statement for a DS2 thread block thread to output a row of data.
Some of the changes between 9.4M5 and 9.4M6 have resulted in the DS2 thread block threads starting quicker and thus the overlap between the threads is more apparent, but there has always been some overlap.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.