Solved: What of my used syntax is the culprit for slowing down the performance...

acordes · Posted 03-22-2023 07:23 AM

Dear @Rick_SAS, in a recent project I succeeded in achieving a considerable performance improvement by using the ParTasks in the iml action.

This time I do manage to get it working in parallel for a subset of the original data set but when I load the whole data set it takes a long time to probably finish (it hasn´t finished yet...).

I suppose that the call sortndx or the do loop take their share in running for so long time.

At a first glance on my code, can you detect what probably is the reason for taking so long time?

To put a little bit of context.

For our portfolio of customers and contracts, I want to identify renewals as those finishing contracts (having finished already within the last 12 months or scheduled for termination for the next 12 months) whose signing customers (customer id) intervene in a newly signed contract.

For commercial customers the matches can become quite numerous.

Once all couples of finishing and new contratcs are found, I calculate a rank order which can be interpreted as a similarity score between ending and new contract (same product, same channel, same brand, both dates close to each other...).

For the ParTasks to work out I order the table by the customer Id and make sure that each of the 16 buckets I send to the ParTasks has no overlapping customer ids.

I only attach the the iml action which performs the task, not the one that loads the data and calls the modules.

As I said, it works correctly for a 200.000 rows data table which makes around 20% of the original data table.


cas mysession sessopts=(caslib="casuser");

proc cas;
loadactionset "iml";
source DefineMods;
/* SAS/IML function that computes N random numbers and appends
   node and thread information */

start acFunnel(LL) ;
x=LL$'vv'; 
cifnif=LL$'comb';
y=LL$'nw';


call sortndx (idx, cifnif, 1);
cifnif=cifnif[idx,];
y=y[idx,];
x=x[idx,];
b=uniqueby(cifnif, 1);

bch=b[loc((b[2:nrow(b),]//nrow(cifnif)) - b > 1)];
/* print bch; */

res=cifnif[loc(element(cifnif, cifnif[bch]))];
resy=y[loc(element(cifnif, cifnif[bch])),];
resx=x[loc(element(cifnif, cifnif[bch])),];
/* print bch res resy resx; */


ids=resy[, 6:7];

match=res[1,]||ids[1,2]||ids[1,2];
match=j(1, 3, BlankStr(30));
similar=j(1,8,.);

do i=1 to nrow(res);
from=res[i,];
idx=setdif(loc(res=from), i);
to_ids=ids[idx,];
to_resy=resy[idx,];
to_resx=resx[idx,];
idx2=loc(abs(intck('month', repeat(resx[i,1], ncol(idx), 1), resx[idx, 2])) <= 12);
if ncol(idx2) then do;
match= match // ( repeat(res[i,1], ncol(idx2), 1) ||  repeat(ids[i,2], ncol(idx2), 1) || to_ids[idx2,2]);

similar= similar // ( ( repeat(resy[i,1:5 || 8], ncol(idx2), 1) =   to_resy[idx2, 1:5 || 8 ] ) || 
             (repeat(resx[i,1], ncol(idx2), 1) - to_resx[idx2, 2])  || ((repeat(resx[i,1], ncol(idx2), 1) - to_resx[idx2, 2]) > 0) ) ;
end;
end;

/* print match similar; */

call sortndx(idx3, match, 2);
match=match[idx3,];
similar=similar[idx3,];
_match=match[,2];

b=uniqueby(_match[,1], 1);
ctr=loc((b[2:nrow(b),]//nrow(_match[,1])) - b > 1);
if ncol(ctr) then bch=b[loc((b[2:nrow(b),]//nrow(_match[,1])) - b > 1)];
easy_win=b[loc((b[2:nrow(b),]//nrow(_match[,1])) - b = 1)];

if nrow(easy_win) then easy_res=match[loc(element(_match, _match[easy_win])), ];
if ncol(ctr) then res=match[loc(element(_match, _match[bch])), ];

similar[, 7]=abs(similar[, 7]);
similar[, 1:6 || 8]=similar[, 1:6 || 8] #-1;

similar_win=j(1,9,.);
_res=j(1, 3, BlankStr(30));

if ncol(ctr) then do;

do j=1 to nrow(bch);
idx4=loc(element(_match, _match[bch[j]]));
st_sim=similar[idx4, ] ;
st_res=match [idx4, ];
ord=      {6 2 4 5 1 3 8 7};
desc_ord= {6 2 4 5 1 3 8};
call sortndx(idx5, st_sim, ord);
st_sim=st_sim[idx5,];
st_res=st_res[idx5,];
similar_win=similar_win // ( st_sim || t(1:nrow(idx5)));
_res=_res // st_res;
end;

end;

_res_red= (easy_res || char(repeat(1, nrow(easy_res), 1)) ) // ( _res[loc(similar_win[, 9] < 5), ] || char(similar_win[loc(similar_win[, 9] < 5), 9 ]) ) ;
   return _res_red;

finish;

start acSimParTasks(labl, opt, L);
   cuts = L$'ord';  vals = L$'xx'; idler=L$'idcomb';  dna=L$'adn';
namer="L1":"L16";
varnames=L$'varn';

L1= [#'vv'=vals[1:cuts[1]-1,],            #'comb'=idler[1:cuts[1]-1,] , #'nw'=dna[1:cuts[1]-1,] ];
L2= [#'vv'=vals[cuts[1]:cuts[2]-1,],      #'comb'=idler[cuts[1]:cuts[2]-1,] , #'nw'=dna[cuts[1]:cuts[2]-1,] ];
L3= [#'vv'=vals[cuts[2]:cuts[3]-1,],      #'comb'=idler[cuts[2]:cuts[3]-1,] , #'nw'=dna[cuts[2]:cuts[3]-1,] ];
L4= [#'vv'=vals[cuts[3]:cuts[4]-1,],      #'comb'=idler[cuts[3]:cuts[4]-1,], #'nw'=dna[cuts[3]:cuts[4]-1,] ];
L5= [#'vv'=vals[cuts[4]:cuts[5]-1,] ,     #'comb'=idler[cuts[4]:cuts[5]-1,], #'nw'=dna[cuts[4]:cuts[5]-1,] ];
L6= [#'vv'=vals[cuts[5]:cuts[6]-1,] ,     #'comb'=idler[cuts[5]:cuts[6]-1,], #'nw'=dna[cuts[5]:cuts[6]-1,] ];
L7= [#'vv'=vals[cuts[6]:cuts[7]-1,] ,     #'comb'=idler[cuts[6]:cuts[7]-1,], #'nw'=dna[cuts[6]:cuts[7]-1,] ];
L8= [#'vv'=vals[cuts[7]:cuts[8]-1,] ,     #'comb'=idler[cuts[7]:cuts[8]-1,], #'nw'=dna[cuts[7]:cuts[8]-1,] ];
L9= [#'vv'=vals[cuts[8]:cuts[9]-1,] ,     #'comb'=idler[cuts[8]:cuts[9]-1,], #'nw'=dna[cuts[8]:cuts[9]-1,] ];
L10=[#'vv'=vals[cuts[9]:cuts[10]-1,] ,    #'comb'=idler[cuts[9]:cuts[10]-1,], #'nw'=dna[cuts[9]:cuts[10]-1,] ];
L11=[#'vv'=vals[cuts[10]:cuts[11]-1,] ,   #'comb'=idler[cuts[10]:cuts[11]-1,], #'nw'=dna[cuts[10]:cuts[11]-1,] ];
L12=[#'vv'=vals[cuts[11]:cuts[12]-1,] ,   #'comb'=idler[cuts[11]:cuts[12]-1,], #'nw'=dna[cuts[11]:cuts[12]-1,] ];
L13=[#'vv'=vals[cuts[12]:cuts[13]-1,] ,   #'comb'=idler[cuts[12]:cuts[13]-1,], #'nw'=dna[cuts[12]:cuts[13]-1,] ];
L14=[#'vv'=vals[cuts[13]:cuts[14]-1,] ,   #'comb'=idler[cuts[13]:cuts[14]-1,], #'nw'=dna[cuts[13]:cuts[14]-1,] ];
L15=[#'vv'=vals[cuts[14]:cuts[15]-1,] ,   #'comb'=idler[cuts[14]:cuts[15]-1,], #'nw'=dna[cuts[14]:cuts[15]-1,] ];
L16=[#'vv'=vals[cuts[15]:cuts[16],] ,     #'comb'=idler[cuts[15]:cuts[16],], #'nw'=dna[cuts[15]:cuts[16], ] ];


   Tasks = repeat('acFunnel', 1, 16);  
   Args = [L1, L2, L3, L4, L5, L6, L7, L8, L9, L10, L11, L12, L13, L14, L15, L16];      
   Results = ParTasks(Tasks, Args, 2);
   free M;
   do i = 1 to ListLen(Results);
      M = M // Results$i;
   end;

idx9=loc( (missing(M) [, +])=0);

MM=M[idx9,];

varnames={'cif_nif' 'to_op' 'from_op' 'rank' };


call MatrixWriteToCAS(MM, '', '_crm_fun', varnames);

finish;

store module=(acFunnel acSimParTasks);
endsource;
iml / code=DefineMods;
run;

Rick_SAS · Posted 03-22-2023 08:58 AM

It would take me a long time to understand your code. A general technique to improve performance is to profile (=timing) the code to determine which parts of the code are taking the longest time. Articles on timing the performance:

Tips: Compare the performance of algorithms in SAS - The DO Loop

Example: https://blogs.sas.com/content/iml/2019/08/19/performance-sas-iml-functions-versus-base-sas.html

For this program, I suspect the main issue is the way that you are concatenating results in a loop. See the article Friends don't let friends concatenate results inside a loop - The DO Loop (sas.com)

Can you pre-compute the number of rows that will be in the MATCH and SIMILAR matrices? (Also, SIMILAR_WIN and _RES.)

A few other thoughts, which might or might not be helpful:

Can you use the DIF function inside the LOC function in this call? bch=b[loc((b[2:nrow(b),]//nrow(cifnif)) - b > 1)];
It seems like you can compute
ndx = loc(element(cifnif, cifnif[bch]));
and reuse the index vector.
I can't tell, but it looks like you might be using character matrices. You have the statement match=j(1, 3, BlankStr(30)); and call the CHAR function to convert integers to strings (?). If so, don't do that. If you are using a grouping variable, can you recode the data so that the groups are represented by integer IDs? Processing numbers (sorting, LOC, copying data, ...) will much faster that processing 30-character strings.
At the end of the ACSimParTasks function, you remove any row that has a missing value and keep only the complete cases. Do this subset operation BEFORE you pass the data to ParTasks. You'll have less data to pass to each task, less data to process, and the result will be the same.

These issues are general programming issues that are unrelated to parallel processing. Therefore, you can modify, debug, and time the functions in PROC IML. After you make them as fast as possible, you can push them to the action and use them for parallel processing.

View solution in original post

IanWakeling · Posted 03-22-2023 08:23 AM

I can see at least 4 places where you are using syntax like x = x // y inside a loop. This 'growing' of a matrix by repeated concatenation is usually very inefficient as a new version of x needs be created for each iteration. It is much better to declare x at its final size and then fill it up as the loop iterates.

Rick_SAS · Posted 03-22-2023 08:58 AM

It would take me a long time to understand your code. A general technique to improve performance is to profile (=timing) the code to determine which parts of the code are taking the longest time. Articles on timing the performance:

Tips: Compare the performance of algorithms in SAS - The DO Loop

Example: https://blogs.sas.com/content/iml/2019/08/19/performance-sas-iml-functions-versus-base-sas.html

For this program, I suspect the main issue is the way that you are concatenating results in a loop. See the article Friends don't let friends concatenate results inside a loop - The DO Loop (sas.com)

Can you pre-compute the number of rows that will be in the MATCH and SIMILAR matrices? (Also, SIMILAR_WIN and _RES.)

A few other thoughts, which might or might not be helpful:

Can you use the DIF function inside the LOC function in this call? bch=b[loc((b[2:nrow(b),]//nrow(cifnif)) - b > 1)];
It seems like you can compute
ndx = loc(element(cifnif, cifnif[bch]));
and reuse the index vector.
I can't tell, but it looks like you might be using character matrices. You have the statement match=j(1, 3, BlankStr(30)); and call the CHAR function to convert integers to strings (?). If so, don't do that. If you are using a grouping variable, can you recode the data so that the groups are represented by integer IDs? Processing numbers (sorting, LOC, copying data, ...) will much faster that processing 30-character strings.
At the end of the ACSimParTasks function, you remove any row that has a missing value and keep only the complete cases. Do this subset operation BEFORE you pass the data to ParTasks. You'll have less data to pass to each task, less data to process, and the result will be the same.

These issues are general programming issues that are unrelated to parallel processing. Therefore, you can modify, debug, and time the functions in PROC IML. After you make them as fast as possible, you can push them to the action and use them for parallel processing.

acordes · Posted 03-22-2023 09:05 AM

Thanks a lot.

Yes, dif will work
yes, I can reuse it
The similar matrix has 0-1 in all but one column (difference in days between both operations) and I convert it to char because I don't know how to return 2 result matrices (one character and one numeric). Therefore I || it to the other matrix.
Yes, I can advance this cleansing operation

I'll improve it at first place and move it to parallel when it's optimized.

Can you help me with the 3rd point?

Rick_SAS · Posted 03-22-2023 09:12 AM

Please explain in words the form of the input and output data and what you are trying to accomplish. Perhaps that will enable someone to help with Step 3. Currently, I don't understand why you want/need to return a character matrix.

acordes · Posted 03-22-2023 09:34 AM

When I exit the parallel task I return a result matrix.

This is made of customer id, new contract, finishing contract and a (naturally numeric) rank variable.

A new contract can be linked to more than one finishing contract if the customer ID is a promotor of our products.

But only one of these candidates is awarded and the car dealer gets a bonus for it. In order to determine the 'winner' a rank is formed, the more simiar both contracts are, the new and the 1:n finishing contracts, the better the rank.

A subset of new contracts only relates to one finishing contract, for those I assign 1 as rank score. These are stored in the matrix easy_res.

_res and similar_win have the same number of rows.

I don't know how to return _res and similar_win within the return syntax. And if I knew I would struggle to accumulate them as results.

_res_red= (easy_res || char(repeat(1, nrow(easy_res), 1)) ) // ( _res[loc(similar_win[, 9] < 5), ] || char(similar_win[loc(similar_win[, 9] < 5), 9 ]) ) ;
   return _res_red;

finish;

   Tasks = repeat('acFunnel', 1, 16);  
   Args = [L1, L2, L3, L4, L5, L6, L7, L8, L9, L10, L11, L12, L13, L14, L15, L16];      
   Results = ParTasks(Tasks, Args, 2);
   free M;
   do i = 1 to ListLen(Results);
      M = M // Results$i;
   end;

Rick_SAS · Posted 03-22-2023 09:58 AM

I still don't understand which columns are character and why.

> I don't know how to return _res and similar_win within the return syntax.

Clearly you are familiar with lists, since you use them as input to the ACFUNNEL function on each thread. If you need each task to return multiple arguments, pack them into a list. The PARTASKS function will then return a list of lists. You can extract and work with the components as needed.

acordes · Posted 03-22-2023 03:11 PM

Just for sharing my findings.

meanwhile I have discovered what derails the code.

I send to the parallel tasks the customer ids in such manner that the inner join of these 16 tasks would be missing.

Inside the task I create cross products by customer id, this reads as create all possible combinations between contracts, new and finishing, and check for the matching criteria that both contracts are at maximum 12 months difference.

I have some commercial custumers that have participated in more than 10000 contracts so that the cross product becomes big. Furthermore I order the customer IDs by name and these commercial customers tend to have a similar naming pattern resulting in all big customers getting assigned to the same task.

If I restrict to customers with less than x participating contracts (< 10 i.e.), then the iml action runs successfully and fast.

So my choice will be to treat them apart or to balance their distribution to the parallel tasks.

Rick_SAS · Posted 03-22-2023 03:15 PM

That's wonderful news, Arne, and it makes perfect sense. Glad you were able to solve your own problem.

What of my used syntax is the culprit for slowing down the performance of the iml action?

Re: What of my used syntax is the culprit for slowing down the performance of the iml action?

Re: What of my used syntax is the culprit for slowing down the performance of the iml action?

Re: What of my used syntax is the culprit for slowing down the performance of the iml action?

Re: What of my used syntax is the culprit for slowing down the performance of the iml action?

Re: What of my used syntax is the culprit for slowing down the performance of the iml action?

Re: What of my used syntax is the culprit for slowing down the performance of the iml action?

Re: What of my used syntax is the culprit for slowing down the performance of the iml action?

Re: What of my used syntax is the culprit for slowing down the performance of the iml action?

Re: What of my used syntax is the culprit for slowing down the performance of the iml action?

Registration is open