BookmarkSubscribeRSS Feed
Rabelais
Obsidian | Level 7

I have a program running in about half an hour, and I'm trying to optimize each part of the code to reduce the execution time. There is a loop which computes, among other things, many times the probit of a matrix (each time the matrix is different). The matrix is big but most of the values are duplicated, so I wonder if there is an hack to speed up the probit by computing it only on the unique values rather than on the many duplicated ones.

 

This is a small 10x7 example reproducing how the matrix is constructed and it's structure

 

 

proc iml;
call randseed(42);

min = 1E-10;
max = 1-min;
num_loans = {4 26 24 1 31 7 54};
num_loans_rep = repeat(num_loans, 10);
pd = { /*probability of default*/
0.026 0.008 0.008 0.038 0.037 0.005 0.003 ,
0.003 0.001 0.012 0.043 0.009 0.011 0.031 ,
0.039 0.010 0.019 0.036 0.002 0.040 0.015 ,
0.002 0.020 0.003 0.019 0.001 0.007 0.031 ,
0.040 0.001 0.049 0.045 0.013 0.044 0.002 ,
0.015 0.021 0.005 0.027 0.046 0.040 0.022 ,
0.031 0.017 0.041 0.016 0.005 0.024 0.026 ,
0.019 0.026 0.042 0.039 0.021 0.006 0.030 ,
0.005 0.011 0.012 0.002 0.017 0.002 0.035 ,
0.050 0.032 0.025 0.042 0.034 0.007 0.016 };

defaulted = randfun({10 7},'binomial', pd, num_loans_rep);
def_rates = defaulted/num_loans;
/*bound def_rates values between min and max*/ def_rates = min <> def_rates >< max; p = probit(def_rates); print num_loans_rep, defaulted, def_rates, p;

call tabulate(value, freq, def_rates);
print (value`)[l='value'] (freq`)[l='freq']; quit;

As you can see about 70% of the matrix in input to the probit function (def_rates) is min = 1E-10, and there are other duplicated values.

 

In the following code there is a larger example 10000x10000 in which the execution time becomes relevant. For simplicity here the matrix def_rates is not constructed as before, but the idea is the same: lot of values are duplicated, so the probit function "wastes" lot of time to compute the same values. In the code I propose three alternative methods to the benchmark: method 2 initializes the output matrix by copying everywhere the probit of the most frequent value (p_min) and then replaces it with the correct values where def_rates is different from min; method 3 computes the sparse matrix of the input pre-bounding matrix (def_rates0 which is 70% zero), computes the probit of the sparse matrix, goes back to the full matrix and replaces the zeros with p_min; method 4 finds the unique values of the input matrix and loops over them to compute its probit and assign it to the corresponing locations in the output matrix.

 

 

 

%macro tic;
%global t_start;
%let t_start = %sysfunc(datetime());
%mend;

%macro toc;
%PUT WARNING- %sysevalf(%sysfunc(datetime()) - &t_start);
%mend;

proc iml;
call randseed(42);

N = 10000;
b = j(N,N,.);
t = 5;
/*70% of values in b are 0*/
call randgen(b,'binomial',0.068,t);
def_rates0 = b/(t+1);
min = 1E-10;
max = 1-min;
def_rates = min <> def_rates0 >< max;

/*method 1 - benchmark (6.5 sec)*/
%tic;
p1 = probit(def_rates);
%toc;

/*method 2 (3.4 sec)*/
%tic;
p_min = probit(min);
p2 = j(N,N,p_min);
idx = loc(def_rates ^= min);
p2[idx] = probit(def_rates[idx]);
%toc;

/*method 3 (5.3 sec)*/
%tic;
s = sparse(def_rates0);
p3 = probit(s[,1]) || s[,2:3];
p3 = full(p3,N,N);
p3[loc(def_rates0 = 0)] = p_min;
%toc;

/*method 4 (8 sec)*/
%tic;
call tabulate(value, freq, def_rates);
p4 = def_rates;
do i=1 to ncol(value);
	p4[loc(p4 = value[i])] = probit(min <> value[i]);
end;
%toc;

max_diff2 = max(abs(p2-p1));
max_diff3 = max(abs(p3-p1));
max_diff4 = max(abs(p4-p1));

print (value`)[l='value'] (freq`)[l='freq'];
if N < 20 then print def_rates, s, p1;
print max_diff2 max_diff3 max_diff4;
quit;

 

 

 

Method 2 is the best and takes almost half of the time of the benchmark, however, since there are only 6 unique values in the input matrix, I dream of a way faster method to solve the problem. For example, probit(1E-10) is computed 70317776 times... I wonder if there is a way to let the probit function (and more generally, any function) to have a "memory" of the already computed values, so that it could simply recall them instead of computing them again and again. Actually, I don't know how vectorization operations work under the hood, I guess that multiple values are computed at the same time, hence it may be that the concept of a "memory" doesn't have any sense.

 

My setup:

SAS EG 8.3 Update 3 (8.3.3.181) (64-bit)

SAS/IML 15.2

3 REPLIES 3
Rick_SAS
SAS Super FREQ

I looked closely at your program, which is very well written and documented. I tried a few alternatives, but was unable to beat "Method 2", which is both easy to understand and easy to implement.  In SAS 9.4, I think that is as fast I can get the computation.

Rabelais
Obsidian | Level 7

Many thanks for your analysis Rick.

 

It would be a dream to have a magic function which would link the duplicate values in a matrix in such a way that when calling a function the evaluation on a single value would be "instantly" propagated to its duplicate values and then those duplicate values would be skipped from the following function evaluations. For example, by running

 

def_rates_linked = link_dup_values(def_rates);
p1 = probit(def_rates_linked);

probit would perform just 6 evaluations and 6 propagations (since def_rates contains 6 unique values), ideally requiring a fraction of a second.

IanWakeling
Barite | Level 11

May be you could find an approximation to the probit function that was faster. There is this stackexchange thread that might be useful, in particular the 3rd answer has an approximation and a link to a paper that discusses more options.  Or perhaps a hybrid approach - if a lot of your probability values are small, it might be possible to find a very good and fast approximation for values less than some threshold (1E-4 say), and then use the probit function for everything else.

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 3 replies
  • 559 views
  • 2 likes
  • 3 in conversation