Dear @Rick_SAS, in a recent project I succeeded in achieving a considerable performance improvement by using the ParTasks in the iml action.
This time I do manage to get it working in parallel for a subset of the original data set but when I load the whole data set it takes a long time to probably finish (it hasn´t finished yet...).
I suppose that the call sortndx or the do loop take their share in running for so long time.
At a first glance on my code, can you detect what probably is the reason for taking so long time?
To put a little bit of context.
For our portfolio of customers and contracts, I want to identify renewals as those finishing contracts (having finished already within the last 12 months or scheduled for termination for the next 12 months) whose signing customers (customer id) intervene in a newly signed contract.
For commercial customers the matches can become quite numerous.
Once all couples of finishing and new contratcs are found, I calculate a rank order which can be interpreted as a similarity score between ending and new contract (same product, same channel, same brand, both dates close to each other...).
For the ParTasks to work out I order the table by the customer Id and make sure that each of the 16 buckets I send to the ParTasks has no overlapping customer ids.
I only attach the the iml action which performs the task, not the one that loads the data and calls the modules.
As I said, it works correctly for a 200.000 rows data table which makes around 20% of the original data table.
cas mysession sessopts=(caslib="casuser");
proc cas;
loadactionset "iml";
source DefineMods;
/* SAS/IML function that computes N random numbers and appends
node and thread information */
start acFunnel(LL) ;
x=LL$'vv';
cifnif=LL$'comb';
y=LL$'nw';
call sortndx (idx, cifnif, 1);
cifnif=cifnif[idx,];
y=y[idx,];
x=x[idx,];
b=uniqueby(cifnif, 1);
bch=b[loc((b[2:nrow(b),]//nrow(cifnif)) - b > 1)];
/* print bch; */
res=cifnif[loc(element(cifnif, cifnif[bch]))];
resy=y[loc(element(cifnif, cifnif[bch])),];
resx=x[loc(element(cifnif, cifnif[bch])),];
/* print bch res resy resx; */
ids=resy[, 6:7];
match=res[1,]||ids[1,2]||ids[1,2];
match=j(1, 3, BlankStr(30));
similar=j(1,8,.);
do i=1 to nrow(res);
from=res[i,];
idx=setdif(loc(res=from), i);
to_ids=ids[idx,];
to_resy=resy[idx,];
to_resx=resx[idx,];
idx2=loc(abs(intck('month', repeat(resx[i,1], ncol(idx), 1), resx[idx, 2])) <= 12);
if ncol(idx2) then do;
match= match // ( repeat(res[i,1], ncol(idx2), 1) || repeat(ids[i,2], ncol(idx2), 1) || to_ids[idx2,2]);
similar= similar // ( ( repeat(resy[i,1:5 || 8], ncol(idx2), 1) = to_resy[idx2, 1:5 || 8 ] ) ||
(repeat(resx[i,1], ncol(idx2), 1) - to_resx[idx2, 2]) || ((repeat(resx[i,1], ncol(idx2), 1) - to_resx[idx2, 2]) > 0) ) ;
end;
end;
/* print match similar; */
call sortndx(idx3, match, 2);
match=match[idx3,];
similar=similar[idx3,];
_match=match[,2];
b=uniqueby(_match[,1], 1);
ctr=loc((b[2:nrow(b),]//nrow(_match[,1])) - b > 1);
if ncol(ctr) then bch=b[loc((b[2:nrow(b),]//nrow(_match[,1])) - b > 1)];
easy_win=b[loc((b[2:nrow(b),]//nrow(_match[,1])) - b = 1)];
if nrow(easy_win) then easy_res=match[loc(element(_match, _match[easy_win])), ];
if ncol(ctr) then res=match[loc(element(_match, _match[bch])), ];
similar[, 7]=abs(similar[, 7]);
similar[, 1:6 || 8]=similar[, 1:6 || 8] #-1;
similar_win=j(1,9,.);
_res=j(1, 3, BlankStr(30));
if ncol(ctr) then do;
do j=1 to nrow(bch);
idx4=loc(element(_match, _match[bch[j]]));
st_sim=similar[idx4, ] ;
st_res=match [idx4, ];
ord= {6 2 4 5 1 3 8 7};
desc_ord= {6 2 4 5 1 3 8};
call sortndx(idx5, st_sim, ord);
st_sim=st_sim[idx5,];
st_res=st_res[idx5,];
similar_win=similar_win // ( st_sim || t(1:nrow(idx5)));
_res=_res // st_res;
end;
end;
_res_red= (easy_res || char(repeat(1, nrow(easy_res), 1)) ) // ( _res[loc(similar_win[, 9] < 5), ] || char(similar_win[loc(similar_win[, 9] < 5), 9 ]) ) ;
return _res_red;
finish;
start acSimParTasks(labl, opt, L);
cuts = L$'ord'; vals = L$'xx'; idler=L$'idcomb'; dna=L$'adn';
namer="L1":"L16";
varnames=L$'varn';
L1= [#'vv'=vals[1:cuts[1]-1,], #'comb'=idler[1:cuts[1]-1,] , #'nw'=dna[1:cuts[1]-1,] ];
L2= [#'vv'=vals[cuts[1]:cuts[2]-1,], #'comb'=idler[cuts[1]:cuts[2]-1,] , #'nw'=dna[cuts[1]:cuts[2]-1,] ];
L3= [#'vv'=vals[cuts[2]:cuts[3]-1,], #'comb'=idler[cuts[2]:cuts[3]-1,] , #'nw'=dna[cuts[2]:cuts[3]-1,] ];
L4= [#'vv'=vals[cuts[3]:cuts[4]-1,], #'comb'=idler[cuts[3]:cuts[4]-1,], #'nw'=dna[cuts[3]:cuts[4]-1,] ];
L5= [#'vv'=vals[cuts[4]:cuts[5]-1,] , #'comb'=idler[cuts[4]:cuts[5]-1,], #'nw'=dna[cuts[4]:cuts[5]-1,] ];
L6= [#'vv'=vals[cuts[5]:cuts[6]-1,] , #'comb'=idler[cuts[5]:cuts[6]-1,], #'nw'=dna[cuts[5]:cuts[6]-1,] ];
L7= [#'vv'=vals[cuts[6]:cuts[7]-1,] , #'comb'=idler[cuts[6]:cuts[7]-1,], #'nw'=dna[cuts[6]:cuts[7]-1,] ];
L8= [#'vv'=vals[cuts[7]:cuts[8]-1,] , #'comb'=idler[cuts[7]:cuts[8]-1,], #'nw'=dna[cuts[7]:cuts[8]-1,] ];
L9= [#'vv'=vals[cuts[8]:cuts[9]-1,] , #'comb'=idler[cuts[8]:cuts[9]-1,], #'nw'=dna[cuts[8]:cuts[9]-1,] ];
L10=[#'vv'=vals[cuts[9]:cuts[10]-1,] , #'comb'=idler[cuts[9]:cuts[10]-1,], #'nw'=dna[cuts[9]:cuts[10]-1,] ];
L11=[#'vv'=vals[cuts[10]:cuts[11]-1,] , #'comb'=idler[cuts[10]:cuts[11]-1,], #'nw'=dna[cuts[10]:cuts[11]-1,] ];
L12=[#'vv'=vals[cuts[11]:cuts[12]-1,] , #'comb'=idler[cuts[11]:cuts[12]-1,], #'nw'=dna[cuts[11]:cuts[12]-1,] ];
L13=[#'vv'=vals[cuts[12]:cuts[13]-1,] , #'comb'=idler[cuts[12]:cuts[13]-1,], #'nw'=dna[cuts[12]:cuts[13]-1,] ];
L14=[#'vv'=vals[cuts[13]:cuts[14]-1,] , #'comb'=idler[cuts[13]:cuts[14]-1,], #'nw'=dna[cuts[13]:cuts[14]-1,] ];
L15=[#'vv'=vals[cuts[14]:cuts[15]-1,] , #'comb'=idler[cuts[14]:cuts[15]-1,], #'nw'=dna[cuts[14]:cuts[15]-1,] ];
L16=[#'vv'=vals[cuts[15]:cuts[16],] , #'comb'=idler[cuts[15]:cuts[16],], #'nw'=dna[cuts[15]:cuts[16], ] ];
Tasks = repeat('acFunnel', 1, 16);
Args = [L1, L2, L3, L4, L5, L6, L7, L8, L9, L10, L11, L12, L13, L14, L15, L16];
Results = ParTasks(Tasks, Args, 2);
free M;
do i = 1 to ListLen(Results);
M = M // Results$i;
end;
idx9=loc( (missing(M) [, +])=0);
MM=M[idx9,];
varnames={'cif_nif' 'to_op' 'from_op' 'rank' };
call MatrixWriteToCAS(MM, '', '_crm_fun', varnames);
finish;
store module=(acFunnel acSimParTasks);
endsource;
iml / code=DefineMods;
run;
It would take me a long time to understand your code. A general technique to improve performance is to profile (=timing) the code to determine which parts of the code are taking the longest time. Articles on timing the performance:
Tips: Compare the performance of algorithms in SAS - The DO Loop
Example: https://blogs.sas.com/content/iml/2019/08/19/performance-sas-iml-functions-versus-base-sas.html
For this program, I suspect the main issue is the way that you are concatenating results in a loop. See the article Friends don't let friends concatenate results inside a loop - The DO Loop (sas.com)
Can you pre-compute the number of rows that will be in the MATCH and SIMILAR matrices? (Also, SIMILAR_WIN and _RES.)
A few other thoughts, which might or might not be helpful:
These issues are general programming issues that are unrelated to parallel processing. Therefore, you can modify, debug, and time the functions in PROC IML. After you make them as fast as possible, you can push them to the action and use them for parallel processing.
I can see at least 4 places where you are using syntax like x = x // y inside a loop. This 'growing' of a matrix by repeated concatenation is usually very inefficient as a new version of x needs be created for each iteration. It is much better to declare x at its final size and then fill it up as the loop iterates.
It would take me a long time to understand your code. A general technique to improve performance is to profile (=timing) the code to determine which parts of the code are taking the longest time. Articles on timing the performance:
Tips: Compare the performance of algorithms in SAS - The DO Loop
Example: https://blogs.sas.com/content/iml/2019/08/19/performance-sas-iml-functions-versus-base-sas.html
For this program, I suspect the main issue is the way that you are concatenating results in a loop. See the article Friends don't let friends concatenate results inside a loop - The DO Loop (sas.com)
Can you pre-compute the number of rows that will be in the MATCH and SIMILAR matrices? (Also, SIMILAR_WIN and _RES.)
A few other thoughts, which might or might not be helpful:
These issues are general programming issues that are unrelated to parallel processing. Therefore, you can modify, debug, and time the functions in PROC IML. After you make them as fast as possible, you can push them to the action and use them for parallel processing.
Thanks a lot.
I'll improve it at first place and move it to parallel when it's optimized.
Can you help me with the 3rd point?
Please explain in words the form of the input and output data and what you are trying to accomplish. Perhaps that will enable someone to help with Step 3. Currently, I don't understand why you want/need to return a character matrix.
When I exit the parallel task I return a result matrix.
This is made of customer id, new contract, finishing contract and a (naturally numeric) rank variable.
A new contract can be linked to more than one finishing contract if the customer ID is a promotor of our products.
But only one of these candidates is awarded and the car dealer gets a bonus for it. In order to determine the 'winner' a rank is formed, the more simiar both contracts are, the new and the 1:n finishing contracts, the better the rank.
A subset of new contracts only relates to one finishing contract, for those I assign 1 as rank score. These are stored in the matrix easy_res.
_res and similar_win have the same number of rows.
I don't know how to return _res and similar_win within the return syntax. And if I knew I would struggle to accumulate them as results.
_res_red= (easy_res || char(repeat(1, nrow(easy_res), 1)) ) // ( _res[loc(similar_win[, 9] < 5), ] || char(similar_win[loc(similar_win[, 9] < 5), 9 ]) ) ;
return _res_red;
finish;
Tasks = repeat('acFunnel', 1, 16);
Args = [L1, L2, L3, L4, L5, L6, L7, L8, L9, L10, L11, L12, L13, L14, L15, L16];
Results = ParTasks(Tasks, Args, 2);
free M;
do i = 1 to ListLen(Results);
M = M // Results$i;
end;
I still don't understand which columns are character and why.
> I don't know how to return _res and similar_win within the return syntax.
Clearly you are familiar with lists, since you use them as input to the ACFUNNEL function on each thread. If you need each task to return multiple arguments, pack them into a list. The PARTASKS function will then return a list of lists. You can extract and work with the components as needed.
Just for sharing my findings.
meanwhile I have discovered what derails the code.
I send to the parallel tasks the customer ids in such manner that the inner join of these 16 tasks would be missing.
Inside the task I create cross products by customer id, this reads as create all possible combinations between contracts, new and finishing, and check for the matching criteria that both contracts are at maximum 12 months difference.
I have some commercial custumers that have participated in more than 10000 contracts so that the cross product becomes big. Furthermore I order the customer IDs by name and these commercial customers tend to have a similar naming pattern resulting in all big customers getting assigned to the same task.
If I restrict to customers with less than x participating contracts (< 10 i.e.), then the iml action runs successfully and fast.
So my choice will be to treat them apart or to balance their distribution to the parallel tasks.
That's wonderful news, Arne, and it makes perfect sense. Glad you were able to solve your own problem.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.