Solved: Re: IML: optimizing a multivariate normal density

Daniel_Paul · Posted 07-10-2017 08:47 AM

Hello,

actually I try to optimize the ML objective function of the multivariate normal density. For this, I basically follow the example of Rick Wicklin (http://blogs.sas.com/content/iml/2011/10/12/maximum-likelihood-estimation-in-sasiml.html). However, instead of using the univariate normal density as objective function I used the mutlivariate normal density (http://blogs.sas.com/content/iml/2012/07/05/compute-multivariate-normal-denstity.html). To avoid/cope with singular covariance matrices I added a suggestion made by Gu & Yung. In additon, I introduce the gradient of the MVN and I place restrictions on the parmater estimates for the elements of the covariance matrices to be symetric.

When I run the code the optimizer success with different starting vectors. However I do not get the sample estimate for the input data set.

Below is an example for two variables with mean mu={1 1.3} and cov={1 -1, -1 1.33}. With starting vector p={1 2 1 -1 -1 2} I get the estimates mue={1.15 4.56} and cove={1.19 -1.23, -1.23 0.31}. Can I trust this results?

Bye, Paul

proc iml;
/*** Write a module that computes the log-likelihood ***/
/** Mahalanobis module from Rick Wicklin (2012) **/
/** Cut and paste the definition of the Mahalanobis module. **/
/** See http://blogs.sas.com/content/iml/2012/02/22/how-to-compute-mahalanobis-distance-in-sas/ **/
start mahalanobis(x, center, cov);
 n = nrow(x);      
 m = nrow(center);
 d2 = j(n,m);    
 U = root(cov, "NoError");
 if all(u=.) then return(1E10); /* adoption of me: if cov is singular, this will force NLP to try another estimate of cov (see Gu & Yung, 2013: A SAS/IML program using the Kalman filter for estimating state space models) */
 Ur = round(trace(ginv(U)*U)); 
 if Ur < nrow(U) then return(1E10); /* adoption of me: if U is singular, this will force NLP to try another estimate of cov */ 
 do k = 1 to m;   
  v = x - center[k,];
  z = trisolv(2, U, v`);
  w = trisolv(1, U, z);
  d2[,k] = (v # w`)[,+];
 end;
 return (sqrt(d2));
finish;
start LogLikMVN(param) global(x,nmu);
 mu = t(param[1:nmu]);
 cov1 = shape(param[(nmu+1):ncol(param)],nmu,nmu);
 cov = (cov1 + t(cov1))/2;
 rowx = nrow(x);
 eins = j(1,rowx,1);
 d = mahalanobis(x, mu, cov);
 if d=1E10 then return(1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
 dd = d#d;
 dsd = eins * dd;
 f = - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * log(det(cov))) - (1/2 * dsd);
 return( f );
finish;

/*** Write a module that computes the gradient of the log-likelihood function ***/
start GradLogLik(param) global (x,nmu);
 mu = t(param[1:nmu]);
 cov1 = shape(param[(nmu+1):ncol(param)],nmu,nmu);
 cov = (cov1 + t(cov1))/2;
 rowx = nrow(x);
 colx = ncol(x);
 eins = j(1,rowx,1);
 xdif = x - mu;
 xdifs = t(xdif[+,]);
 invcov=inv(cov);
 dfdmu = t(invcov*xdifs);
 B = j(colx,colx,0);
 do i=1 to rowx;
  xred = xdif[i,];
  A = t(xred)*xred;
  B = B + A;
 end;
 dfdcov1 = -0.5 * (invcov -  invcov*B*invcov); 
 dfdcov = shape(dfdcov1,1); 
 return ( dfdmu || dfdcov );
finish;

/*** Call an optimization routine ***/
nmu = {2};
x = {0 4,
     2 2,
     1 4};
con = { .  .  0  .  .  0  .  .,  
        .  .  .  .  .  .  .  .,
	.  .  .  1 -1  .  0  .}; 
p = {1 2 1 -1 -1 2};
opt = {1,5};
call nlpnra(rc, result, "LogLikMVN", p, opt, con) grd="GradLogLik";
quit;

Rick_SAS · Posted 07-17-2017 11:17 AM

1. I think we had some miscommunication earlier about the value of opt[1]. At the time I wasn't sure whether you were maximizing or minimizing the objective function. I now see that you are maximizing the LogLikMVN function, so use opt[1]=1, not opt[1]=0.

2. The way you use LOGABSDET is wrong. The second term in the loglikelihood is the log of the absolute value of the determinant. You are using +/- of that value. Change to

logdet1 = logabsdet(cov)[1]; /* log( |det| ) */

You can simplify some of the other computations. I believe the following gives the correct MLE. I'll leave you to figure out the GradLogLik issue.

start LogLikMVN(param) global(x);
   nmu = ncol(x);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   d = mahalanobis(x, mu, cov);
   if d=1E10 then return(-1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
   dsd = sum(d#d);
   logdet1 = logabsdet(cov)[1];  /* log( |det| ) */
   f = - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * logdet1) - (1/2 * dsd);
   return( f );
finish;

con = { .   .   0   .   0,  
        .   .   .   .   .}; 
p = {2 3 7 1 10};
opt = {1,4};
call nlpnra(rc, result, "LogLikMVN", p, opt, con); /* grd="GradLogLik" */

View solution in original post

Rick_SAS · Posted 07-10-2017 09:34 AM

1. You don't need to write your own MAHALANOBIS function. As it says in the blog post, the MAHALANOBIS function has been distributed with SAS/IML for many releases.

2.The covariance matrix is symmetric so there are only five independent parameters, not six. You can form a symmetric matrix from a vector by using the SQRVECH function.

3. If you are trying to MINIMIZE a function, use opt[1]=0. To MAXIMIZE a function use opt[1]=1.

4. You can use the built-in LOGABSDET function to compute the log of the determinant of a matrix.

Daniel_Paul · Posted 07-17-2017 04:03 AM

Hey Rick,

thank you for the fast reply. I'am ashamed about point two and three, sorry for that. With respect to the mahalanobis distance: I had notice that there is an inbuilt function for calculating this distance in IML. However, because I will avoid that a singular covariance matrix is optained during the process of optimization (which will produce error message in the log), I have adopt the mahalanobis distance to reject such solutions (see adoptions of me in the modul).

Underneath is the code with your suggestions included. However, runig this code will produce an error message in the log "NEWRAP Optimization cannot be completed." and a warning message "Optimization routine cannot improve the function value.". The solution mue = {-5.7 4.97} and cove = {0.05 -0.83, -0.83 12.55} is far away from the sample estimates mu = {3 4} and cov = {8 1.1, 1.1 11.1}.

I have comparde my objective function of the MVN with your pdf function of the MVN (http://blogs.sas.com/content/iml/2012/07/05/compute-multivariate-normal-denstity.html) and I get the same result (when I log your solution). So I assume it is not a problem of the objective function. I also compare my gradient function (GradLogLik) with the results of finit difference approximation and the results are the same. Hence, I assume it is also not a problem of the gradient function. Do you have any further suggestions?

proc iml;
/*** Write a module that computes the log-likelihood ***/
/** Mahalanobis module from Rick Wicklin (2012) **/
/** Cut and paste the definition of the Mahalanobis module. **/
/** See http://blogs.sas.com/content/iml/2012/02/22/how-to-compute-mahalanobis-distance-in-sas/ **/
start mahalanobis(x, center, cov);
   n = nrow(x);      
   m = nrow(center);
   d2 = j(n,m); 
   U = root(cov, "NoError");
   if all(u=.) then return(1E10); /* adoption of me: if cov is singular, this will force NLP to try another estimate of cov (see Gu & Yung, 2013: A SAS/IML program using the Kalman filter for estimating state space models) */
   Ur = round(trace(ginv(U)*U)); 
   if Ur < nrow(U) then return(1E10); /* adoption of me: if U is singular, this will force NLP to try another estimate of cov */ 
   do k = 1 to m;   
      v = x - center[k,];
      z = trisolv(2, U, v`);
      w = trisolv(1, U, z);
      d2[,k] = (v # w`)[,+];
   end;
   return (sqrt(d2));
finish;
start LogLikMVN(param) global(x,nmu);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   eins = j(1,rowx,1);
   d = mahalanobis(x, mu, cov);
   if d=1E10 then return(1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
   dd = d#d;
   dsd = eins * dd;
   logdet1 = logabsdet(cov);
   logdet = choose(logdet1[2],logdet1[1] * logdet1[2], 0);
   f = - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * logdet) - (1/2 * dsd);
   return( f );
finish;



/*** Write a module that computes the gradient of the log-likelihood function ***/
start GradLogLik(param) global (x,nmu);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   colx = ncol(x);
   eins = j(1,rowx,1);
   xdif = x - mu;
   xdifs = t(xdif[+,]);
   invcov=inv(cov);
   dfdmu = t(invcov*xdifs);
   B = j(colx,colx,0);
   do i=1 to rowx;
	xred = xdif[i,];
	A = t(xred)*xred;
	B = B + A;
   end;
   dfdcov1 = -0.5 * (invcov -  invcov*B*invcov);
   dfdcov = t(vech(dfdcov1)); 
   return ( dfdmu || dfdcov );
finish;


/*** Call an optimization routine ***/
nmu = {2};
x = {0  4,
     2  2,
     1  4,
     6  3,
     2  1,
     1  0,
     3  8,
     5  1,
     9  7,
     1 10};
con = { .   .   0   .   0,  
        .   .   .   .   .}; 
p = {2 3 7 1 10};
opt = {0,5};
call nlpnra(rc, result, "LogLikMVN", p, opt, con) grd="GradLogLik";
quit;

Rick_SAS · Posted 07-17-2017 09:13 AM

There are general tips in the article "Ten tips before you run an optimization"

Regarding your particular problem, I hope to take a look at it later. Two quick suggestions:
1. I believe you should return -1e10 to short-circuit the LogLikMVN function?

2. If you use the sample moments for the initial parameter estimates, then the NLP iteration runs off to infinity. I would double-check the correctness of the objective function.

sampleMean = mean(x);
sampleCov = cov(x);
p = rowvec(sampleMean) || rowvec(sampleCov[{1 2 4}]);

Rick_SAS · Posted 07-17-2017 10:14 AM

The finite-difference derivative does not seem to match the value from the GradLogLik function:

sampleMean = mean(x);
sampleCov = cov(x);
print sampleMean, samplecov;
p = rowvec(sampleMean) || rowvec(sampleCov[{1 2 4}]);

LL = LogLikMVN(p);
grad = GradLogLik(p);
print LL, grad;

call nlpfdd(func, FDDGrad, hessian, "LogLikMVN", p);  
print func, FDDGrad;

Rick_SAS · Posted 07-17-2017 11:17 AM

1. I think we had some miscommunication earlier about the value of opt[1]. At the time I wasn't sure whether you were maximizing or minimizing the objective function. I now see that you are maximizing the LogLikMVN function, so use opt[1]=1, not opt[1]=0.

2. The way you use LOGABSDET is wrong. The second term in the loglikelihood is the log of the absolute value of the determinant. You are using +/- of that value. Change to

logdet1 = logabsdet(cov)[1]; /* log( |det| ) */

You can simplify some of the other computations. I believe the following gives the correct MLE. I'll leave you to figure out the GradLogLik issue.

start LogLikMVN(param) global(x);
   nmu = ncol(x);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   d = mahalanobis(x, mu, cov);
   if d=1E10 then return(-1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
   dsd = sum(d#d);
   logdet1 = logabsdet(cov)[1];  /* log( |det| ) */
   f = - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * logdet1) - (1/2 * dsd);
   return( f );
finish;

con = { .   .   0   .   0,  
        .   .   .   .   .}; 
p = {2 3 7 1 10};
opt = {1,4};
call nlpnra(rc, result, "LogLikMVN", p, opt, con); /* grd="GradLogLik" */

Daniel_Paul · Posted 07-17-2017 11:46 AM

Hey Rick,

that's right, I also figure out that we have different objective functions in mind (log likelihood vs. negative log likelihood), so I already switch the parameter opt[1]=1. With this I get apparent right estimates of the parameters (mue = {3 4} and cove ={7.2 1, 1 10}}, if I don't use the gradient function module.

So in summary: Compared to the original code one has to reduce the number of parameters, use the LOGABSDET and change 1E10 to -1E10. In addition I have to adope the gradient function module. I will post the working code as soon as I have installed the correct gradient function module.

Thank you for helping me.

Bye, Paul

Daniel_Paul · Posted 07-24-2017 05:12 AM

For the sake of completeness here the working code with gradient function:

proc iml;
/*** Write a module that computes the log-likelihood ***/
/** Mahalanobis module from Rick Wicklin (2012) **/
/** Cut and paste the definition of the Mahalanobis module. **/
/** See http://blogs.sas.com/content/iml/2012/02/22/how-to-compute-mahalanobis-distance-in-sas/ **/
start mahalanobis(x, center, cov);
   n = nrow(x);      
   m = nrow(center);
   d2 = j(n,m); 
   U = root(cov, "NoError");
   if all(u=.) then return(-1E10); /* adoption of me: if cov is singular, this will force NLP to try another estimate of cov (see Gu & Yung, 2013: A SAS/IML program using the Kalman filter for estimating state space models) */
   Ur = round(trace(ginv(U)*U)); 
   if Ur < nrow(U) then return(-1E10); /* adoption of me: if U is singular, this will force NLP to try another estimate of cov */ 
   do k = 1 to m;   
     v = x - center[k,];
     z = trisolv(2, U, v`);
     w = trisolv(1, U, z);
     d2[,k] = (v # w`)[,+];
   end;
   return (sqrt(d2));
finish;
start LogLikMVN(param) global(x);
   nmu = ncol(x);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   d = mahalanobis(x, mu, cov);
   if d=-1E10 then return(-1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
   dsd = sum(d#d);
   logdet = logabsdet(cov)[1];
   f =  - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * logdet) - (1/2 * dsd);
   return( f );
finish;



/*** Write a module that computes the gradient of the log-likelihood function ***/
/** Diagonal module from Rick Wicklin (2013) **/
/** See http://blogs.sas.com/content/iml/2013/10/21/assign-the-diagonal-elements-of-a-matrix.html **/
start SetDiag(A, v);
   diagIdx = do(1,nrow(A)*ncol(A), ncol(A)+1);
   A[diagIdx] = v;             /* set diagonal elements */
finish;
start GradLogLik(param) global (x);
   nmu = ncol(x);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   xdif = x - mu;
   xdifs = t(xdif[+,]);
   invcov=inv(cov);
   dfdmu = t(invcov*xdifs);
   ftwo = j(nmu,nmu,2);
   run SetDiag(ftwo, 1);
   B = j(nmu,nmu,0);
   do i=1 to rowx;
     xred = xdif[i,];
     A = t(xred)*xred;
     B = B + A;
   end;
   dfdcov1a = -rowx/2 * invcov;
   dfdcov1b = 1/2 * invcov*B*invcov;
   dfdcov1c = dfdcov1a + dfdcov1b;
   dfdcov1 = ftwo#dfdcov1c;
   dfdcov = t(vech(dfdcov1));
   return ( dfdmu || dfdcov );
finish;


/*** Call an optimization routine ***/
x = {0 4 2,
     2 2 1,
     1 4 5,
     6 3 3,
     2 1 2,
     1 0 1,
     3 8 0,
     5 1 8,
     9 7 1,
     1 10 12};
con = { .   .   .   0   .   .   0   .   0,  
        .   .   .   .   .   .   .   .   .}; 
p = {1 1 1 7 1 1 10 2 10};
opt = {1,5};
call nlpnra(rc, result, "LogLikMVN", p, opt, con) grd="GradLogLik";
quit;

Rick_SAS · Posted 07-24-2017 05:39 AM

Hi @Daniel_Paul,

I'm glad you got it working. If you replace the loop

   B = j(nmu,nmu,0);
   do i=1 to rowx;
     xred = xdif[i,];
     A = t(xred)*xred;
     B = B + A;
   end;

with the matrix multiplication

   B = xdif`*xdif;

I suspect you will see a significant performance improvement.

Best wishes!

Registration is open