Solved: Re: SAS/IML for a Maximum Likelihood Estimation

lxn1021 · Posted 08-09-2015 09:31 PM

Hi, I have a optimization problem regarding the Maximum Likelihood Estimation in IML:

I'm trying to acquire the constant values of parameters F, Mu, R, Q, and the starting values beta_0_0 and P_0_0 by using the MLE method. My codes are as follows:

%macro initial;

Proc iml;

start LL(param);

f11=param[1];

f12=param[2];

f13=param[3];

f21=param[4];

f22=param[5];

f23=param[6];

f31=param[7];

f32=param[8];

f33=param[9];

f1=f11||f12||f13;

f2=f21||f22||f23;

f3=f31||f32||f33;

f=f1//f2//f3;

q11=param[10];

q12=param[11];

q13=param[12];

q21=param[13];

q22=param[14];

q23=param[15];

q31=param[16];

q32=param[17];

q33=param[18];

q1=q11||q12||q13;

q2=q21||q22||q23;

q3=q31||q32||q33;

r=param[19];

mu1=param[20];

mu2=param[21];

mu3=param[22];

mu=mu1//mu2//mu3;

beta1=param[23];

beta2=param[24];

beta3=param[25];

beta_0_0=beta1//beta2//beta3;

p11=param[26];

p12=param[27];

p13=param[28];

p21=param[29];

p22=param[30];

p23=param[31];

p31=param[32];

p32=param[33];

p33=param[34];

p1=p11||p12||p13;

p2=p21||p22||p23;

p3=p31||p32||p33;

p_0_0=p1//p2//p3;

LL=0;

%do i=1 %to 400;

use origin;

read all var{y} into y_&i where(time=&i);

read all var{x1,x2,x3} into x_&i where(time=&i)

beta_&i._%eval(&i-1)=mu+f*beta_%eval(&i-1)_%eval(&i-1);

p_&i._%eval(&i-1)=f*p_%eval(&i-1)_%eval(&i-1)*f`+q;

aita_&i._%eval(&i-1)=y_&i-x_&i*beta_&i._%eval(&i-1);

f_&i._%eval(&i-1)=x_&i*p_&i._%eval(&i-1)*(x_&i)`+r;

k_&i=p_&i._%eval(&i-1)*(x_&i)`*inv(f_&i._%eval(&i-1));

beta_&i._&i=beta_&i._%eval(&i-1)+k_&i*aita_&i._%eval(&i-1)

p_&i._&i=p_&i._%eval(&i-1)-k_&i*x_&i*p_&i._%eval(&i-1)

if f_&i._%eval(&i-1)<=0 then

LL=.;

else

LL=LL-(1/2)*log(2*3.14*f_&i._%eval(&i-1))-(1/2)*(aita_&i._%eval(&i-1))`*inv(f_&i._%eval(&i-1))*aita_&i._%eval(&i-1)

%end;

return(LL);

finish;

initialj(1,34,0.1);

opt={1,2};

call nlpnra(rc,result,"LL",initial,opt);

quit;

%mend;

%initial

The program runs. However, very very slow. Looks like it takes tremendous time adding up the function LL. I'm wandering how to fix the codes to make it run much faster? Thanks so much!

Rick_SAS · Posted 08-10-2015 06:59 AM

The first part of your code can also be simplified by using matrices. Here's how to put the parameters into local variables:

f = shape( param[1:9], 3, 3 );

q = shape( param[10:18], 3, 3);

r = param[19];

mu = param[20:22];

beta00 = param[23:25];

p00 = shape( param[26:34], 3, 3);

To build on Ian's suggestions, you can eliminate the macro, which appears to be computing quantities based on y_i and x_i, and updating the p and beta matrices. You can use the LOC function to build these matrices and vectors on the fly during an ordinary DO loop in SAS/IML.Inside the loop you only need to keep the previous and version of each matrix. For example,

instead of defining p_&i_&i and p_&i._%eval(&i-1) for 400 values of i, you can use p for the current iteration and p00 for the previous. At the end of the loop, update the matrices that you'l need for the next loop.

The following (UNTESTED) code should give you an idea of what the loop will look like when you've made the conversion:

use origin;
read all var {time y};
read all var {x1 x2 x3} into x;
close;

do i = 1 to 400;
   idx = loc(time=i);
   yt = y[idx,];
   xt = x[idx,];
   beta = mu+f*beta00;
   p = f*p00*f`+q;
   aita =yt-xt*beta;
   f = xt*p*(xt)`+r;
   k=p*(xt)`*inv(f);
   /* update values of p00 and beta00 for next iteration */
   beta00=beta+k*aita;
   p00=p-k*xt*p;
end;

View solution in original post

IanWakeling · Posted 08-10-2015 06:07 AM

The most inefficient part of your code is the fact that you are accessing the stored SAS data set "origin" 400 times for every call to the log likelihood function, consequently there is going to be an IO bottle-neck. It should be possible to read the data variables, y, x1, x2, x3, & time just the once, and outside of the LL function, and then make them available within using syntax like:

start LL(param) global(y,x1,x2,x3,time);

Then try to eliminate the macro loop completely, I have not tried to understand what the code is attempting to do, but I think it should be possible to rewrite using pure IML code which references the 5 global matrices.

Rick_SAS · Posted 08-10-2015 06:59 AM

The first part of your code can also be simplified by using matrices. Here's how to put the parameters into local variables:

f = shape( param[1:9], 3, 3 );

q = shape( param[10:18], 3, 3);

r = param[19];

mu = param[20:22];

beta00 = param[23:25];

p00 = shape( param[26:34], 3, 3);

To build on Ian's suggestions, you can eliminate the macro, which appears to be computing quantities based on y_i and x_i, and updating the p and beta matrices. You can use the LOC function to build these matrices and vectors on the fly during an ordinary DO loop in SAS/IML.Inside the loop you only need to keep the previous and version of each matrix. For example,

instead of defining p_&i_&i and p_&i._%eval(&i-1) for 400 values of i, you can use p for the current iteration and p00 for the previous. At the end of the loop, update the matrices that you'l need for the next loop.

The following (UNTESTED) code should give you an idea of what the loop will look like when you've made the conversion:

use origin;
read all var {time y};
read all var {x1 x2 x3} into x;
close;

do i = 1 to 400;
   idx = loc(time=i);
   yt = y[idx,];
   xt = x[idx,];
   beta = mu+f*beta00;
   p = f*p00*f`+q;
   aita =yt-xt*beta;
   f = xt*p*(xt)`+r;
   k=p*(xt)`*inv(f);
   /* update values of p00 and beta00 for next iteration */
   beta00=beta+k*aita;
   p00=p-k*xt*p;
end;

Ksharp · Posted 08-10-2015 08:35 AM

OR using Genetic Algorithm .

I still remember in Rick'blog there is also a paper about how to Maximum Likelihood Function to get the best parameter of Normal Distribution, You can refer to it either .

Xia Keshan

Rick_SAS · Posted 08-10-2015 08:40 AM

The reference is

http://blogs.sas.com/content/iml/2011/10/12/maximum-likelihood-estimation-in-sasiml.html

Rick_SAS · Posted 08-10-2015 08:51 AM

In the computation for LL, you have an expression that is equivalent to

LL=LL-(1/2)*log(2*3.14*FF)-(1/2)*aita`*inv(FF)*aita;

where FF(=f_&i._%eval(&i-1)) is some expression that is computed in the loop.

Is FF supposed to be a matrix or a scalar?

If FF is a matrix, then log(2*3.14*FF) is also a matrix and so LL is a matrix, which seems strange for a log-likelihood.

lxn1021 · Posted 08-11-2015 10:52 AM

Hi Rick,

Sorry about the confusion. Here FF(=f_&i._%eval(&i-1)) is a scalar. Since f_&i._%eval(&i-1)=x_&i*p_&i._%eval(&i-1)*(x_&i)`+r, x={x1 x2 x3}. Sorry I forgot to declare x~~

lxn1021 · Posted 08-11-2015 10:48 AM

Hi, the codes have been fixed!! Thanks so much!

The modified codes are as follows:

proc iml;

start LL(param);

f=shape(param[1:9],3,3);

q=shape(param[10:18],3,3);

r=param[19];

mu=shape(param[20:22],3,1);

beta_current=shape(param[23:25],3,1);

p_current=shape(param[26:34],3,3);

use origin;

read all var{time} into time;

read all var{y} into y;

read all var{x1,x2,x3} into x;

LL=0;

do i=1 to 405;

idx=loc(time=i);

y_t=y[idx,];

x_t=x[idx,];

beta_pred=mu+f*beta_current;

p_pred=f*p_current*f`+q;

aita_pred=y_t-x_t*beta_pred;

f_pred=x_t*p_pred*(x_t)`+r;

k=p_pred*(x_t)`*inv(f_pred);

beta_current=beta_pred+k*aita_pred;

p_current=p_pred-k*x_t*p_pred;

if f_pred<=0 then

LL=.;

else

LL=LL-(1/2)*log(2*3.14*f_pred)-(1/2)*(aita_pred)`*inv(f_pred)*aita_pred;

end;

return(LL);

finish;

con_f_1=j(1,9,-0.95);

con_q_1=j(1,9,0);

con_r_1=j(1,1,0);

con_mu_1=j(1,3,.);

con_beta_1=j(1,3,.);

con_p_1=j(1,9,0);

con_f_2=j(1,9,0.95);

con_q_2=j(1,9,.);

con_r_2=j(1,1,.);

con_mu_2=j(1,3,.);

con_beta_2=j(1,3,.);

con_p_2=j(1,9,.);

con_1=con_f_1||con_q_1||con_r_1||con_mu_1||con_beta_1||con_p_1;

con_2=con_f_2||con_q_2||con_r_2||con_mu_2||con_beta_2||con_p_2;

con=con_1//con_2;

initial=j(1,34,0.1);

opt={1,2};

call nlpnra(rc,result,"LL",initial,opt,con);

quit;

Here I set the constraints to some parameters. The whole program only runs 4 minutes! Much much faster~~

In the future I'll try to avoid the mixture of MACRO and IML~~~And use the LOC function as the index, rather than the macro variable. Thanks so much~~

Rick_SAS · Posted 08-11-2015 11:28 AM

Great news! Glad to hear.

If your input data set is sorted by time, you can probably skip the LOC function completely. For example, if every 3 observations represent the next time period, you could say:

idx=1:3; /* initialize rows for time period 1 */

do i=1 to 405;

...

idx = idx + 3; /* prepare for time period i+1 */

end;

lxn1021 · Posted 08-11-2015 12:53 PM

Hi Rick,

Got it! Thank you~~

And, I came across your article regarding SAS/IML and MACRO http://blogs.sas.com/content/iml/2013/06/19/macros-and-loops.html Before this moment, I've always been a big fan of MACRO, since I thought it's capable of doing almost everything. But from this example, I guess maybe sometimes 'abusing' MACRO would seriously and negatively affect the efficiency? So maybe I should try to avoid it, especially when the problem is able to be solved by IML? Thanks a lot~~~

Rick_SAS · Posted 08-11-2015 03:15 PM

That's my philosophy: mixing macro and IML is confusing and prone to error. And since SAS/IML already has variables, loops, conditionals, etc, it's usually unnecessary for my work.

lxn1021 · Posted 08-11-2015 03:22 PM

OK. Thank you~~

Registration is open