Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-22-2023 07:23 AM
(405 views)

Dear @Rick_SAS, in a recent project I succeeded in achieving a considerable performance improvement by using the ParTasks in the iml action.

This time I do manage to get it working in parallel for a subset of the original data set but when I load the whole data set it takes a long time to probably finish (it hasn´t finished yet...).

I suppose that the call sortndx or the do loop take their share in running for so long time.

At a first glance on my code, can you detect what probably is the reason for taking so long time?

To put a little bit of context.

For our portfolio of customers and contracts, I want to identify renewals as those finishing contracts (having finished already within the last 12 months or scheduled for termination for the next 12 months) whose signing customers (customer id) intervene in a newly signed contract.

For commercial customers the matches can become quite numerous.

Once all couples of finishing and new contratcs are found, I calculate a rank order which can be interpreted as a similarity score between ending and new contract (same product, same channel, same brand, both dates close to each other...).

For the ParTasks to work out I order the table by the customer Id and make sure that each of the 16 buckets I send to the ParTasks has no overlapping customer ids.

I only attach the the iml action which performs the task, not the one that loads the data and calls the modules.

As I said, it works correctly for a 200.000 rows data table which makes around 20% of the original data table.

```
cas mysession sessopts=(caslib="casuser");
proc cas;
loadactionset "iml";
source DefineMods;
/* SAS/IML function that computes N random numbers and appends
node and thread information */
start acFunnel(LL) ;
x=LL$'vv';
cifnif=LL$'comb';
y=LL$'nw';
call sortndx (idx, cifnif, 1);
cifnif=cifnif[idx,];
y=y[idx,];
x=x[idx,];
b=uniqueby(cifnif, 1);
bch=b[loc((b[2:nrow(b),]//nrow(cifnif)) - b > 1)];
/* print bch; */
res=cifnif[loc(element(cifnif, cifnif[bch]))];
resy=y[loc(element(cifnif, cifnif[bch])),];
resx=x[loc(element(cifnif, cifnif[bch])),];
/* print bch res resy resx; */
ids=resy[, 6:7];
match=res[1,]||ids[1,2]||ids[1,2];
match=j(1, 3, BlankStr(30));
similar=j(1,8,.);
do i=1 to nrow(res);
from=res[i,];
idx=setdif(loc(res=from), i);
to_ids=ids[idx,];
to_resy=resy[idx,];
to_resx=resx[idx,];
idx2=loc(abs(intck('month', repeat(resx[i,1], ncol(idx), 1), resx[idx, 2])) <= 12);
if ncol(idx2) then do;
match= match // ( repeat(res[i,1], ncol(idx2), 1) || repeat(ids[i,2], ncol(idx2), 1) || to_ids[idx2,2]);
similar= similar // ( ( repeat(resy[i,1:5 || 8], ncol(idx2), 1) = to_resy[idx2, 1:5 || 8 ] ) ||
(repeat(resx[i,1], ncol(idx2), 1) - to_resx[idx2, 2]) || ((repeat(resx[i,1], ncol(idx2), 1) - to_resx[idx2, 2]) > 0) ) ;
end;
end;
/* print match similar; */
call sortndx(idx3, match, 2);
match=match[idx3,];
similar=similar[idx3,];
_match=match[,2];
b=uniqueby(_match[,1], 1);
ctr=loc((b[2:nrow(b),]//nrow(_match[,1])) - b > 1);
if ncol(ctr) then bch=b[loc((b[2:nrow(b),]//nrow(_match[,1])) - b > 1)];
easy_win=b[loc((b[2:nrow(b),]//nrow(_match[,1])) - b = 1)];
if nrow(easy_win) then easy_res=match[loc(element(_match, _match[easy_win])), ];
if ncol(ctr) then res=match[loc(element(_match, _match[bch])), ];
similar[, 7]=abs(similar[, 7]);
similar[, 1:6 || 8]=similar[, 1:6 || 8] #-1;
similar_win=j(1,9,.);
_res=j(1, 3, BlankStr(30));
if ncol(ctr) then do;
do j=1 to nrow(bch);
idx4=loc(element(_match, _match[bch[j]]));
st_sim=similar[idx4, ] ;
st_res=match [idx4, ];
ord= {6 2 4 5 1 3 8 7};
desc_ord= {6 2 4 5 1 3 8};
call sortndx(idx5, st_sim, ord);
st_sim=st_sim[idx5,];
st_res=st_res[idx5,];
similar_win=similar_win // ( st_sim || t(1:nrow(idx5)));
_res=_res // st_res;
end;
end;
_res_red= (easy_res || char(repeat(1, nrow(easy_res), 1)) ) // ( _res[loc(similar_win[, 9] < 5), ] || char(similar_win[loc(similar_win[, 9] < 5), 9 ]) ) ;
return _res_red;
finish;
start acSimParTasks(labl, opt, L);
cuts = L$'ord'; vals = L$'xx'; idler=L$'idcomb'; dna=L$'adn';
namer="L1":"L16";
varnames=L$'varn';
L1= [#'vv'=vals[1:cuts[1]-1,], #'comb'=idler[1:cuts[1]-1,] , #'nw'=dna[1:cuts[1]-1,] ];
L2= [#'vv'=vals[cuts[1]:cuts[2]-1,], #'comb'=idler[cuts[1]:cuts[2]-1,] , #'nw'=dna[cuts[1]:cuts[2]-1,] ];
L3= [#'vv'=vals[cuts[2]:cuts[3]-1,], #'comb'=idler[cuts[2]:cuts[3]-1,] , #'nw'=dna[cuts[2]:cuts[3]-1,] ];
L4= [#'vv'=vals[cuts[3]:cuts[4]-1,], #'comb'=idler[cuts[3]:cuts[4]-1,], #'nw'=dna[cuts[3]:cuts[4]-1,] ];
L5= [#'vv'=vals[cuts[4]:cuts[5]-1,] , #'comb'=idler[cuts[4]:cuts[5]-1,], #'nw'=dna[cuts[4]:cuts[5]-1,] ];
L6= [#'vv'=vals[cuts[5]:cuts[6]-1,] , #'comb'=idler[cuts[5]:cuts[6]-1,], #'nw'=dna[cuts[5]:cuts[6]-1,] ];
L7= [#'vv'=vals[cuts[6]:cuts[7]-1,] , #'comb'=idler[cuts[6]:cuts[7]-1,], #'nw'=dna[cuts[6]:cuts[7]-1,] ];
L8= [#'vv'=vals[cuts[7]:cuts[8]-1,] , #'comb'=idler[cuts[7]:cuts[8]-1,], #'nw'=dna[cuts[7]:cuts[8]-1,] ];
L9= [#'vv'=vals[cuts[8]:cuts[9]-1,] , #'comb'=idler[cuts[8]:cuts[9]-1,], #'nw'=dna[cuts[8]:cuts[9]-1,] ];
L10=[#'vv'=vals[cuts[9]:cuts[10]-1,] , #'comb'=idler[cuts[9]:cuts[10]-1,], #'nw'=dna[cuts[9]:cuts[10]-1,] ];
L11=[#'vv'=vals[cuts[10]:cuts[11]-1,] , #'comb'=idler[cuts[10]:cuts[11]-1,], #'nw'=dna[cuts[10]:cuts[11]-1,] ];
L12=[#'vv'=vals[cuts[11]:cuts[12]-1,] , #'comb'=idler[cuts[11]:cuts[12]-1,], #'nw'=dna[cuts[11]:cuts[12]-1,] ];
L13=[#'vv'=vals[cuts[12]:cuts[13]-1,] , #'comb'=idler[cuts[12]:cuts[13]-1,], #'nw'=dna[cuts[12]:cuts[13]-1,] ];
L14=[#'vv'=vals[cuts[13]:cuts[14]-1,] , #'comb'=idler[cuts[13]:cuts[14]-1,], #'nw'=dna[cuts[13]:cuts[14]-1,] ];
L15=[#'vv'=vals[cuts[14]:cuts[15]-1,] , #'comb'=idler[cuts[14]:cuts[15]-1,], #'nw'=dna[cuts[14]:cuts[15]-1,] ];
L16=[#'vv'=vals[cuts[15]:cuts[16],] , #'comb'=idler[cuts[15]:cuts[16],], #'nw'=dna[cuts[15]:cuts[16], ] ];
Tasks = repeat('acFunnel', 1, 16);
Args = [L1, L2, L3, L4, L5, L6, L7, L8, L9, L10, L11, L12, L13, L14, L15, L16];
Results = ParTasks(Tasks, Args, 2);
free M;
do i = 1 to ListLen(Results);
M = M // Results$i;
end;
idx9=loc( (missing(M) [, +])=0);
MM=M[idx9,];
varnames={'cif_nif' 'to_op' 'from_op' 'rank' };
call MatrixWriteToCAS(MM, '', '_crm_fun', varnames);
finish;
store module=(acFunnel acSimParTasks);
endsource;
iml / code=DefineMods;
run;
```

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It would take me a long time to understand your code. A general technique to improve performance is to profile (=timing) the code to determine which parts of the code are taking the longest time. Articles on timing the performance:

Tips: Compare the performance of algorithms in SAS - The DO Loop

Example: https://blogs.sas.com/content/iml/2019/08/19/performance-sas-iml-functions-versus-base-sas.html

For this program, I suspect the main issue is the way that you are concatenating results in a loop. See the article Friends don't let friends concatenate results inside a loop - The DO Loop (sas.com)

Can you pre-compute the number of rows that will be in the MATCH and SIMILAR matrices? (Also, SIMILAR_WIN and _RES.)

A few other thoughts, which might or might not be helpful:

- Can you use the DIF function inside the LOC function in this call?
**bch=b[loc((b[2:nrow(b),]//nrow(cifnif)) - b > 1)];** - It seems like you can compute
**ndx = loc(element(cifnif, cifnif[bch]));**

and reuse the index vector. - I can't tell, but it looks like you might be using character matrices. You have the statement
**match=j(1, 3, BlankStr(30));**and call the CHAR function to convert integers to strings (?). If so, don't do that. If you are using a grouping variable, can you recode the data so that the groups are represented by integer IDs? Processing numbers (sorting, LOC, copying data, ...) will much faster that processing 30-character strings. - At the end of the ACSimParTasks function, you remove any row that has a missing value and keep only the complete cases. Do this subset operation BEFORE you pass the data to ParTasks. You'll have less data to pass to each task, less data to process, and the result will be the same.

These issues are general programming issues that are unrelated to parallel processing. Therefore, you can modify, debug, and time the functions in PROC IML. After you make them as fast as possible, you can push them to the action and use them for parallel processing.

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It would take me a long time to understand your code. A general technique to improve performance is to profile (=timing) the code to determine which parts of the code are taking the longest time. Articles on timing the performance:

Tips: Compare the performance of algorithms in SAS - The DO Loop

Example: https://blogs.sas.com/content/iml/2019/08/19/performance-sas-iml-functions-versus-base-sas.html

For this program, I suspect the main issue is the way that you are concatenating results in a loop. See the article Friends don't let friends concatenate results inside a loop - The DO Loop (sas.com)

Can you pre-compute the number of rows that will be in the MATCH and SIMILAR matrices? (Also, SIMILAR_WIN and _RES.)

A few other thoughts, which might or might not be helpful:

- Can you use the DIF function inside the LOC function in this call?
**bch=b[loc((b[2:nrow(b),]//nrow(cifnif)) - b > 1)];** - It seems like you can compute
**ndx = loc(element(cifnif, cifnif[bch]));**

and reuse the index vector. - I can't tell, but it looks like you might be using character matrices. You have the statement
**match=j(1, 3, BlankStr(30));**and call the CHAR function to convert integers to strings (?). If so, don't do that. If you are using a grouping variable, can you recode the data so that the groups are represented by integer IDs? Processing numbers (sorting, LOC, copying data, ...) will much faster that processing 30-character strings. - At the end of the ACSimParTasks function, you remove any row that has a missing value and keep only the complete cases. Do this subset operation BEFORE you pass the data to ParTasks. You'll have less data to pass to each task, less data to process, and the result will be the same.

These issues are general programming issues that are unrelated to parallel processing. Therefore, you can modify, debug, and time the functions in PROC IML. After you make them as fast as possible, you can push them to the action and use them for parallel processing.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks a lot.

- Yes, dif will work
- yes, I can reuse it
- The similar matrix has 0-1 in all but one column (difference in days between both operations) and I convert it to char because I don't know how to return 2 result matrices (one character and one numeric). Therefore I || it to the other matrix.
- Yes, I can advance this cleansing operation

I'll improve it at first place and move it to parallel when it's optimized.

Can you help me with the 3rd point?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

When I exit the parallel task I return a result matrix.

This is made of customer id, new contract, finishing contract and a (naturally numeric) rank variable.

A new contract can be linked to more than one finishing contract if the customer ID is a promotor of our products.

But only one of these candidates is awarded and the car dealer gets a bonus for it. In order to determine the 'winner' a rank is formed, the more simiar both contracts are, the new and the 1:n finishing contracts, the better the rank.

A subset of new contracts only relates to one finishing contract, for those I assign 1 as rank score. These are stored in the matrix easy_res.

_res and similar_win have the same number of rows.

I don't know how to return _res and similar_win within the return syntax. And if I knew I would struggle to accumulate them as results.

```
_res_red= (easy_res || char(repeat(1, nrow(easy_res), 1)) ) // ( _res[loc(similar_win[, 9] < 5), ] || char(similar_win[loc(similar_win[, 9] < 5), 9 ]) ) ;
return _res_red;
finish;
```

```
Tasks = repeat('acFunnel', 1, 16);
Args = [L1, L2, L3, L4, L5, L6, L7, L8, L9, L10, L11, L12, L13, L14, L15, L16];
Results = ParTasks(Tasks, Args, 2);
free M;
do i = 1 to ListLen(Results);
M = M // Results$i;
end;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I still don't understand which columns are character and why.

*> I don't know how to return _res and similar_win within the return syntax.*

Clearly you are familiar with lists, since you use them as input to the ACFUNNEL function on each thread. If you need each task to return multiple arguments, pack them into a list. The PARTASKS function will then return a list of lists. You can extract and work with the components as needed.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Just for sharing my findings.

meanwhile I have discovered what derails the code.

I send to the parallel tasks the customer ids in such manner that the inner join of these 16 tasks would be missing.

Inside the task I create cross products by customer id, this reads as create all possible combinations between contracts, new and finishing, and check for the matching criteria that both contracts are at maximum 12 months difference.

I have some commercial custumers that have participated in more than 10000 contracts so that the cross product becomes big. Furthermore I order the customer IDs by name and these commercial customers tend to have a similar naming pattern resulting in all big customers getting assigned to the same task.

If I restrict to customers with less than x participating contracts (< 10 i.e.), then the iml action runs successfully and fast.

So my choice will be to treat them apart or to balance their distribution to the parallel tasks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.