I have a program running in about half an hour, and I'm trying to optimize each part of the code to reduce the execution time. There is a loop which computes, among other things, many times the probit of a matrix (each time the matrix is different). The matrix is big but most of the values are duplicated, so I wonder if there is an hack to speed up the probit by computing it only on the unique values rather than on the many duplicated ones.
This is a small 10x7 example reproducing how the matrix is constructed and it's structure
proc iml;
call randseed(42);
min = 1E-10;
max = 1-min;
num_loans = {4 26 24 1 31 7 54};
num_loans_rep = repeat(num_loans, 10);
pd = { /*probability of default*/
0.026 0.008 0.008 0.038 0.037 0.005 0.003 ,
0.003 0.001 0.012 0.043 0.009 0.011 0.031 ,
0.039 0.010 0.019 0.036 0.002 0.040 0.015 ,
0.002 0.020 0.003 0.019 0.001 0.007 0.031 ,
0.040 0.001 0.049 0.045 0.013 0.044 0.002 ,
0.015 0.021 0.005 0.027 0.046 0.040 0.022 ,
0.031 0.017 0.041 0.016 0.005 0.024 0.026 ,
0.019 0.026 0.042 0.039 0.021 0.006 0.030 ,
0.005 0.011 0.012 0.002 0.017 0.002 0.035 ,
0.050 0.032 0.025 0.042 0.034 0.007 0.016 };
defaulted = randfun({10 7},'binomial', pd, num_loans_rep);
def_rates = defaulted/num_loans; /*bound def_rates values between min and max*/
def_rates = min <> def_rates >< max;
p = probit(def_rates);
print num_loans_rep, defaulted, def_rates, p; call tabulate(value, freq, def_rates); print (value`)[l='value'] (freq`)[l='freq'];
quit;
As you can see about 70% of the matrix in input to the probit function ( def_rates ) is min = 1E-10 , and there are other duplicated values.
In the following code there is a larger example 10000x10000 in which the execution time becomes relevant. For simplicity here the matrix def_rates is not constructed as before, but the idea is the same: lot of values are duplicated, so the probit function "wastes" lot of time to compute the same values. In the code I propose three alternative methods to the benchmark: method 2 initializes the output matrix by copying everywhere the probit of the most frequent value ( p_min ) and then replaces it with the correct values where def_rates is different from min ; method 3 computes the sparse matrix of the input pre-bounding matrix ( def_rates0 which is 70% zero), computes the probit of the sparse matrix, goes back to the full matrix and replaces the zeros with p_min ; method 4 finds the unique values of the input matrix and loops over them to compute its probit and assign it to the corresponing locations in the output matrix.
%macro tic;
%global t_start;
%let t_start = %sysfunc(datetime());
%mend;
%macro toc;
%PUT WARNING- %sysevalf(%sysfunc(datetime()) - &t_start);
%mend;
proc iml;
call randseed(42);
N = 10000;
b = j(N,N,.);
t = 5;
/*70% of values in b are 0*/
call randgen(b,'binomial',0.068,t);
def_rates0 = b/(t+1);
min = 1E-10;
max = 1-min;
def_rates = min <> def_rates0 >< max;
/*method 1 - benchmark (6.5 sec)*/
%tic;
p1 = probit(def_rates);
%toc;
/*method 2 (3.4 sec)*/
%tic;
p_min = probit(min);
p2 = j(N,N,p_min);
idx = loc(def_rates ^= min);
p2[idx] = probit(def_rates[idx]);
%toc;
/*method 3 (5.3 sec)*/
%tic;
s = sparse(def_rates0);
p3 = probit(s[,1]) || s[,2:3];
p3 = full(p3,N,N);
p3[loc(def_rates0 = 0)] = p_min;
%toc;
/*method 4 (8 sec)*/
%tic;
call tabulate(value, freq, def_rates);
p4 = def_rates;
do i=1 to ncol(value);
p4[loc(p4 = value[i])] = probit(min <> value[i]);
end;
%toc;
max_diff2 = max(abs(p2-p1));
max_diff3 = max(abs(p3-p1));
max_diff4 = max(abs(p4-p1));
print (value`)[l='value'] (freq`)[l='freq'];
if N < 20 then print def_rates, s, p1;
print max_diff2 max_diff3 max_diff4;
quit;
Method 2 is the best and takes almost half of the time of the benchmark, however, since there are only 6 unique values in the input matrix, I dream of a way faster method to solve the problem. For example, probit(1E-10) is computed 70317776 times... I wonder if there is a way to let the probit function (and more generally, any function) to have a "memory" of the already computed values, so that it could simply recall them instead of computing them again and again. Actually, I don't know how vectorization operations work under the hood, I guess that multiple values are computed at the same time, hence it may be that the concept of a "memory" doesn't have any sense.
My setup:
SAS EG 8.3 Update 3 (8.3.3.181) (64-bit)
SAS/IML 15.2
... View more