BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lxn1021
Obsidian | Level 7

Hi, I have a optimization problem regarding the Maximum Likelihood Estimation in IML:

I'm trying to acquire the constant values of parameters F, Mu, R, Q, and the starting values beta_0_0 and P_0_0 by using the MLE method. My codes are as follows:

%macro initial;

Proc iml;

    start LL(param);

          f11=param[1];

          f12=param[2];

          f13=param[3];

          f21=param[4];

          f22=param[5];

          f23=param[6];

          f31=param[7];

          f32=param[8];

          f33=param[9];

          f1=f11||f12||f13;

          f2=f21||f22||f23;

          f3=f31||f32||f33;

          f=f1//f2//f3;

          q11=param[10];

          q12=param[11];

          q13=param[12];

          q21=param[13];

          q22=param[14];

          q23=param[15];

          q31=param[16];

          q32=param[17];

          q33=param[18];

          q1=q11||q12||q13;

          q2=q21||q22||q23;

          q3=q31||q32||q33;

          r=param[19];

          mu1=param[20];

          mu2=param[21];

          mu3=param[22];

          mu=mu1//mu2//mu3;

          beta1=param[23];

          beta2=param[24];

          beta3=param[25];

          beta_0_0=beta1//beta2//beta3;

          p11=param[26];

          p12=param[27];

          p13=param[28];

          p21=param[29];

          p22=param[30];

          p23=param[31];

          p31=param[32];

          p32=param[33];

          p33=param[34];

          p1=p11||p12||p13;

          p2=p21||p22||p23;

          p3=p31||p32||p33;

          p_0_0=p1//p2//p3;   

          LL=0;

          %do i=1 %to 400;

              use origin;

              read all var{y} into y_&i where(time=&i);

              read all var{x1,x2,x3} into x_&i where(time=&i)

              beta_&i._%eval(&i-1)=mu+f*beta_%eval(&i-1)_%eval(&i-1);

              p_&i._%eval(&i-1)=f*p_%eval(&i-1)_%eval(&i-1)*f`+q;

              aita_&i._%eval(&i-1)=y_&i-x_&i*beta_&i._%eval(&i-1);

              f_&i._%eval(&i-1)=x_&i*p_&i._%eval(&i-1)*(x_&i)`+r;

              k_&i=p_&i._%eval(&i-1)*(x_&i)`*inv(f_&i._%eval(&i-1));

              beta_&i._&i=beta_&i._%eval(&i-1)+k_&i*aita_&i._%eval(&i-1)

              p_&i._&i=p_&i._%eval(&i-1)-k_&i*x_&i*p_&i._%eval(&i-1)

              if f_&i._%eval(&i-1)<=0 then

                    LL=.;   

              else

                    LL=LL-(1/2)*log(2*3.14*f_&i._%eval(&i-1))-(1/2)*(aita_&i._%eval(&i-1))`*inv(f_&i._%eval(&i-1))*aita_&i._%eval(&i-1)

          %end;   

          return(LL);

    finish;

    initialj(1,34,0.1);

    opt={1,2};

    call nlpnra(rc,result,"LL",initial,opt);

quit;

%mend;

%initial


The program runs. However, very very slow.  Looks like it takes tremendous time adding up the function LL.     I'm wandering how to fix the codes to make it run much faster? Thanks so much!


        


         

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

The first part of your code can also be simplified by using matrices. Here's how to put the parameters into local variables:

f = shape( param[1:9], 3, 3 );

q = shape( param[10:18], 3, 3);

r = param[19];

mu = param[20:22];

beta00 = param[23:25];

p00 = shape( param[26:34], 3, 3);

To build on Ian's suggestions,  you can eliminate the macro, which appears to be computing quantities based on y_i and x_i, and updating the p and beta matrices.  You can use the LOC function to build these matrices and vectors on the fly during an ordinary DO loop in SAS/IML.Inside the loop you only need to keep the previous and version of each matrix. For example,

instead of defining  p_&i_&i and p_&i._%eval(&i-1) for 400 values of i, you can use p for the current iteration and p00 for the previous. At the end of the loop, update the matrices that you'l need for the next loop.

The following (UNTESTED) code should give you an idea of what the loop will look like when you've made the conversion:

use origin;
read all var {time y};
read all var {x1 x2 x3} into x;
close;


do i = 1 to 400;
   idx = loc(time=i);
   yt = y[idx,];
   xt = x[idx,];
   beta = mu+f*beta00;
   p = f*p00*f`+q;
   aita =yt-xt*beta;
   f = xt*p*(xt)`+r;
   k=p*(xt)`*inv(f);
   /* update values of p00 and beta00 for next iteration */
   beta00=beta+k*aita;
   p00=p-k*xt*p;
end;


View solution in original post

11 REPLIES 11
IanWakeling
Barite | Level 11

The most inefficient part of your code is the fact that you are accessing the stored SAS data set "origin" 400 times for every call to the log likelihood function, consequently there is going to be an IO bottle-neck.  It should be possible to read the data variables, y, x1, x2, x3, & time just the once, and outside of the LL function, and then make them available within using syntax like:

start LL(param) global(y,x1,x2,x3,time);

Then try to eliminate the macro loop completely, I have not tried to understand what the code is attempting to do, but I think it should be possible to rewrite using pure IML code which references the 5 global matrices.

Rick_SAS
SAS Super FREQ

The first part of your code can also be simplified by using matrices. Here's how to put the parameters into local variables:

f = shape( param[1:9], 3, 3 );

q = shape( param[10:18], 3, 3);

r = param[19];

mu = param[20:22];

beta00 = param[23:25];

p00 = shape( param[26:34], 3, 3);

To build on Ian's suggestions,  you can eliminate the macro, which appears to be computing quantities based on y_i and x_i, and updating the p and beta matrices.  You can use the LOC function to build these matrices and vectors on the fly during an ordinary DO loop in SAS/IML.Inside the loop you only need to keep the previous and version of each matrix. For example,

instead of defining  p_&i_&i and p_&i._%eval(&i-1) for 400 values of i, you can use p for the current iteration and p00 for the previous. At the end of the loop, update the matrices that you'l need for the next loop.

The following (UNTESTED) code should give you an idea of what the loop will look like when you've made the conversion:

use origin;
read all var {time y};
read all var {x1 x2 x3} into x;
close;


do i = 1 to 400;
   idx = loc(time=i);
   yt = y[idx,];
   xt = x[idx,];
   beta = mu+f*beta00;
   p = f*p00*f`+q;
   aita =yt-xt*beta;
   f = xt*p*(xt)`+r;
   k=p*(xt)`*inv(f);
   /* update values of p00 and beta00 for next iteration */
   beta00=beta+k*aita;
   p00=p-k*xt*p;
end;


Ksharp
Super User

OR using Genetic Algorithm .

I still remember in Rick'blog there is also a paper about how to Maximum Likelihood Function to get the best parameter of Normal Distribution, You can refer to it either .

Xia Keshan

Rick_SAS
SAS Super FREQ

In the computation for LL, you have an expression that is equivalent to

LL=LL-(1/2)*log(2*3.14*FF)-(1/2)*aita`*inv(FF)*aita;

where FF(=f_&i._%eval(&i-1)) is some expression that is computed in the loop.

Is FF supposed to be a matrix or a scalar?

If FF is a matrix, then log(2*3.14*FF) is also a matrix and so LL is a matrix, which seems strange for a log-likelihood.

lxn1021
Obsidian | Level 7

Hi Rick,

Sorry about the confusion. Here FF(=f_&i._%eval(&i-1)) is a scalar. Since f_&i._%eval(&i-1)=x_&i*p_&i._%eval(&i-1)*(x_&i)`+r, x={x1 x2 x3}. Sorry I forgot to declare x~~

lxn1021
Obsidian | Level 7

Hi, the codes have been fixed!! Thanks so much!

The modified codes are as follows:

proc iml;

  start LL(param);

  f=shape(param[1:9],3,3);

  q=shape(param[10:18],3,3);

  r=param[19];

  mu=shape(param[20:22],3,1);

  beta_current=shape(param[23:25],3,1);

  p_current=shape(param[26:34],3,3);

  use origin;

  read all var{time} into time;

  read all var{y} into y;

  read all var{x1,x2,x3} into x;

  LL=0;

  do i=1 to 405;

  idx=loc(time=i);

  y_t=y[idx,];

  x_t=x[idx,];

  beta_pred=mu+f*beta_current;

  p_pred=f*p_current*f`+q;

  aita_pred=y_t-x_t*beta_pred;

  f_pred=x_t*p_pred*(x_t)`+r;

  k=p_pred*(x_t)`*inv(f_pred);

  beta_current=beta_pred+k*aita_pred;

  p_current=p_pred-k*x_t*p_pred;

  if f_pred<=0 then

  LL=.;

  else

  LL=LL-(1/2)*log(2*3.14*f_pred)-(1/2)*(aita_pred)`*inv(f_pred)*aita_pred;

  end;

  return(LL);

  finish;

  con_f_1=j(1,9,-0.95);

  con_q_1=j(1,9,0);

  con_r_1=j(1,1,0);

  con_mu_1=j(1,3,.);

  con_beta_1=j(1,3,.);

  con_p_1=j(1,9,0);

  con_f_2=j(1,9,0.95);

  con_q_2=j(1,9,.);

  con_r_2=j(1,1,.);

  con_mu_2=j(1,3,.);

  con_beta_2=j(1,3,.);

  con_p_2=j(1,9,.);

  con_1=con_f_1||con_q_1||con_r_1||con_mu_1||con_beta_1||con_p_1;

  con_2=con_f_2||con_q_2||con_r_2||con_mu_2||con_beta_2||con_p_2;

  con=con_1//con_2;

  initial=j(1,34,0.1);

  opt={1,2};

  call nlpnra(rc,result,"LL",initial,opt,con);

quit;

Here I set the constraints to some parameters.  The whole program only runs 4 minutes! Much much faster~~

In the future I'll try to avoid the mixture of MACRO and IML~~~And use the LOC function as the index, rather than the macro variable. Thanks so much~~

Rick_SAS
SAS Super FREQ

Great news! Glad to hear.

If your input data set is sorted by time, you can probably skip the LOC function completely. For example, if every 3 observations represent the next time period, you could say:

  idx=1:3;    /* initialize rows for time period 1 */

  do i=1 to 405;

...

idx = idx + 3;  /* prepare for time period i+1 */

  end;

lxn1021
Obsidian | Level 7

Hi Rick,

Got it! Thank you~~

And, I came across your article regarding SAS/IML and MACRO http://blogs.sas.com/content/iml/2013/06/19/macros-and-loops.html   Before this moment, I've always been a big fan of MACRO, since I thought it's capable of doing almost everything. But from this example, I guess maybe sometimes 'abusing' MACRO would seriously and negatively affect the efficiency?  So maybe I should try to avoid it, especially when the problem is able to be solved by IML? Thanks a lot~~~

Rick_SAS
SAS Super FREQ

That's my philosophy: mixing macro and IML is confusing and prone to error.  And since SAS/IML already has variables, loops, conditionals, etc, it's usually unnecessary for my work.

lxn1021
Obsidian | Level 7

OK. Thank you~~

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 11 replies
  • 2166 views
  • 6 likes
  • 4 in conversation