BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Daniel_Paul
Obsidian | Level 7

Hello,

 

actually I try to optimize the ML objective function of the multivariate normal density. For this, I basically follow the example of Rick Wicklin (http://blogs.sas.com/content/iml/2011/10/12/maximum-likelihood-estimation-in-sasiml.html). However, instead of using the univariate normal density as objective function I used the mutlivariate normal density (http://blogs.sas.com/content/iml/2012/07/05/compute-multivariate-normal-denstity.html). To avoid/cope with singular covariance matrices I added a suggestion made by Gu & Yung. In additon, I introduce the gradient of the MVN and I place restrictions on the parmater estimates for the elements of the covariance matrices to be symetric. 

 

When I run the code the optimizer success with different starting vectors. However I do not get the sample estimate for the input data set.

 

Below is an example for two variables with mean mu={1 1.3} and cov={1 -1, -1 1.33}. With starting vector p={1 2 1 -1 -1 2} I get the estimates mue={1.15 4.56} and cove={1.19 -1.23, -1.23 0.31}. Can I trust this results?

 

Bye, Paul

 

proc iml;
/*** Write a module that computes the log-likelihood ***/
/** Mahalanobis module from Rick Wicklin (2012) **/
/** Cut and paste the definition of the Mahalanobis module. **/
/** See http://blogs.sas.com/content/iml/2012/02/22/how-to-compute-mahalanobis-distance-in-sas/ **/
start mahalanobis(x, center, cov);
 n = nrow(x);      
 m = nrow(center);
 d2 = j(n,m);    
 U = root(cov, "NoError");
 if all(u=.) then return(1E10); /* adoption of me: if cov is singular, this will force NLP to try another estimate of cov (see Gu & Yung, 2013: A SAS/IML program using the Kalman filter for estimating state space models) */
 Ur = round(trace(ginv(U)*U)); 
 if Ur < nrow(U) then return(1E10); /* adoption of me: if U is singular, this will force NLP to try another estimate of cov */ 
 do k = 1 to m;   
  v = x - center[k,];
  z = trisolv(2, U, v`);
  w = trisolv(1, U, z);
  d2[,k] = (v # w`)[,+];
 end;
 return (sqrt(d2));
finish;
start LogLikMVN(param) global(x,nmu);
 mu = t(param[1:nmu]);
 cov1 = shape(param[(nmu+1):ncol(param)],nmu,nmu);
 cov = (cov1 + t(cov1))/2;
 rowx = nrow(x);
 eins = j(1,rowx,1);
 d = mahalanobis(x, mu, cov);
 if d=1E10 then return(1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
 dd = d#d;
 dsd = eins * dd;
 f = - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * log(det(cov))) - (1/2 * dsd);
 return( f );
finish;

/*** Write a module that computes the gradient of the log-likelihood function ***/
start GradLogLik(param) global (x,nmu);
 mu = t(param[1:nmu]);
 cov1 = shape(param[(nmu+1):ncol(param)],nmu,nmu);
 cov = (cov1 + t(cov1))/2;
 rowx = nrow(x);
 colx = ncol(x);
 eins = j(1,rowx,1);
 xdif = x - mu;
 xdifs = t(xdif[+,]);
 invcov=inv(cov);
 dfdmu = t(invcov*xdifs);
 B = j(colx,colx,0);
 do i=1 to rowx;
  xred = xdif[i,];
  A = t(xred)*xred;
  B = B + A;
end; dfdcov1 = -0.5 * (invcov - invcov*B*invcov); dfdcov = shape(dfdcov1,1); return ( dfdmu || dfdcov ); finish; /*** Call an optimization routine ***/ nmu = {2}; x = {0 4, 2 2, 1 4}; con = { . . 0 . . 0 . ., . . . . . . . ., . . . 1 -1 . 0 .}; p = {1 2 1 -1 -1 2}; opt = {1,5}; call nlpnra(rc, result, "LogLikMVN", p, opt, con) grd="GradLogLik"; quit;

       

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

1. I think we had some miscommunication earlier about the value of opt[1]. At the time I wasn't sure whether you were maximizing or minimizing the objective function. I now see that you are maximizing the LogLikMVN function, so use opt[1]=1, not opt[1]=0.

2. The way you use LOGABSDET is wrong. The second term in the loglikelihood is the log of the absolute value of the determinant.  You are using +/- of that value. Change to 

 logdet1 = logabsdet(cov)[1];  /* log( |det| ) */

 

You can simplify some of the other computations. I believe the following gives the correct MLE. I'll leave you to figure out the GradLogLik issue.

 

start LogLikMVN(param) global(x);
   nmu = ncol(x);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   d = mahalanobis(x, mu, cov);
   if d=1E10 then return(-1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
   dsd = sum(d#d);
   logdet1 = logabsdet(cov)[1];  /* log( |det| ) */
   f = - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * logdet1) - (1/2 * dsd);
   return( f );
finish;

con = { .   .   0   .   0,  
        .   .   .   .   .}; 
p = {2 3 7 1 10};
opt = {1,4};
call nlpnra(rc, result, "LogLikMVN", p, opt, con); /* grd="GradLogLik" */

View solution in original post

8 REPLIES 8
Rick_SAS
SAS Super FREQ

1. You don't need to write your own MAHALANOBIS function. As it says in the blog post, the MAHALANOBIS function has been distributed with SAS/IML for many releases.

2.The covariance matrix is symmetric so there are only five independent parameters, not six. You can form a symmetric matrix from a vector by using the SQRVECH function.

3. If you are trying to MINIMIZE a function, use opt[1]=0. To MAXIMIZE a function use opt[1]=1.

4. You can use the built-in LOGABSDET function to compute the log of the determinant of a matrix.

 

Daniel_Paul
Obsidian | Level 7

Hey Rick,

 

thank you for the fast reply. I'am ashamed about point two and three, sorry for that. With respect to the mahalanobis distance: I had notice that there is an inbuilt function for calculating this distance in IML. However, because I will avoid that a singular covariance matrix is optained during the process of optimization (which will produce error message in the log), I have adopt the mahalanobis distance to reject such solutions (see adoptions of me in the modul). 

 

Underneath is the code with your suggestions included. However, runig this code will produce an error message in the log "NEWRAP Optimization cannot be completed." and a warning message "Optimization routine cannot improve the function value.". The solution mue = {-5.7 4.97} and cove = {0.05 -0.83, -0.83 12.55} is far away from the sample estimates mu = {3 4} and cov = {8 1.1, 1.1 11.1}.

 

I have comparde my objective function of the MVN with your pdf function of the MVN (http://blogs.sas.com/content/iml/2012/07/05/compute-multivariate-normal-denstity.html) and I get the same result (when I log your solution). So I assume it is not a problem of the objective function. I also compare my gradient function (GradLogLik) with the results of finit difference approximation and the results are the same. Hence, I assume it is also not a problem of the gradient function. Do you have any further suggestions?

 

proc iml;
/*** Write a module that computes the log-likelihood ***/
/** Mahalanobis module from Rick Wicklin (2012) **/
/** Cut and paste the definition of the Mahalanobis module. **/
/** See http://blogs.sas.com/content/iml/2012/02/22/how-to-compute-mahalanobis-distance-in-sas/ **/
start mahalanobis(x, center, cov);
   n = nrow(x);      
   m = nrow(center);
   d2 = j(n,m); 
   U = root(cov, "NoError");
   if all(u=.) then return(1E10); /* adoption of me: if cov is singular, this will force NLP to try another estimate of cov (see Gu & Yung, 2013: A SAS/IML program using the Kalman filter for estimating state space models) */
   Ur = round(trace(ginv(U)*U)); 
   if Ur < nrow(U) then return(1E10); /* adoption of me: if U is singular, this will force NLP to try another estimate of cov */ 
   do k = 1 to m;   
      v = x - center[k,];
      z = trisolv(2, U, v`);
      w = trisolv(1, U, z);
      d2[,k] = (v # w`)[,+];
   end;
   return (sqrt(d2));
finish;
start LogLikMVN(param) global(x,nmu);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   eins = j(1,rowx,1);
   d = mahalanobis(x, mu, cov);
   if d=1E10 then return(1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
   dd = d#d;
   dsd = eins * dd;
   logdet1 = logabsdet(cov);
   logdet = choose(logdet1[2],logdet1[1] * logdet1[2], 0);
   f = - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * logdet) - (1/2 * dsd);
   return( f );
finish;



/*** Write a module that computes the gradient of the log-likelihood function ***/
start GradLogLik(param) global (x,nmu);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   colx = ncol(x);
   eins = j(1,rowx,1);
   xdif = x - mu;
   xdifs = t(xdif[+,]);
   invcov=inv(cov);
   dfdmu = t(invcov*xdifs);
   B = j(colx,colx,0);
   do i=1 to rowx;
	xred = xdif[i,];
	A = t(xred)*xred;
	B = B + A;
   end;
   dfdcov1 = -0.5 * (invcov -  invcov*B*invcov);
   dfdcov = t(vech(dfdcov1)); 
   return ( dfdmu || dfdcov );
finish;


/*** Call an optimization routine ***/
nmu = {2};
x = {0  4,
     2  2,
     1  4,
     6  3,
     2  1,
     1  0,
     3  8,
     5  1,
     9  7,
     1 10};
con = { .   .   0   .   0,  
        .   .   .   .   .}; 
p = {2 3 7 1 10};
opt = {0,5};
call nlpnra(rc, result, "LogLikMVN", p, opt, con) grd="GradLogLik";
quit;

   

   

Rick_SAS
SAS Super FREQ

There are general tips in the article "Ten tips before you run an optimization"

 

Regarding your particular problem, I hope to take a look at it later.  Two quick suggestions:
1. I believe you should return -1e10 to short-circuit the LogLikMVN function?

2. If you use the sample moments for the initial parameter estimates, then the NLP iteration runs off to infinity. I would double-check the correctness of the objective function. 

 

sampleMean = mean(x);
sampleCov = cov(x);
p = rowvec(sampleMean) || rowvec(sampleCov[{1 2 4}]);
Rick_SAS
SAS Super FREQ

The finite-difference derivative does not seem to match the value from the GradLogLik function:

 

sampleMean = mean(x);
sampleCov = cov(x);
print sampleMean, samplecov;
p = rowvec(sampleMean) || rowvec(sampleCov[{1 2 4}]);

LL = LogLikMVN(p);
grad = GradLogLik(p);
print LL, grad;

call nlpfdd(func, FDDGrad, hessian, "LogLikMVN", p);  
print func, FDDGrad;
Rick_SAS
SAS Super FREQ

1. I think we had some miscommunication earlier about the value of opt[1]. At the time I wasn't sure whether you were maximizing or minimizing the objective function. I now see that you are maximizing the LogLikMVN function, so use opt[1]=1, not opt[1]=0.

2. The way you use LOGABSDET is wrong. The second term in the loglikelihood is the log of the absolute value of the determinant.  You are using +/- of that value. Change to 

 logdet1 = logabsdet(cov)[1];  /* log( |det| ) */

 

You can simplify some of the other computations. I believe the following gives the correct MLE. I'll leave you to figure out the GradLogLik issue.

 

start LogLikMVN(param) global(x);
   nmu = ncol(x);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   d = mahalanobis(x, mu, cov);
   if d=1E10 then return(-1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
   dsd = sum(d#d);
   logdet1 = logabsdet(cov)[1];  /* log( |det| ) */
   f = - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * logdet1) - (1/2 * dsd);
   return( f );
finish;

con = { .   .   0   .   0,  
        .   .   .   .   .}; 
p = {2 3 7 1 10};
opt = {1,4};
call nlpnra(rc, result, "LogLikMVN", p, opt, con); /* grd="GradLogLik" */
Daniel_Paul
Obsidian | Level 7

Hey Rick,

 

that's right, I also figure out that we have different objective functions in mind (log likelihood vs. negative log likelihood), so I already switch the parameter opt[1]=1. With this I get apparent right estimates of the parameters (mue = {3 4} and cove ={7.2 1, 1 10}}, if I don't use the gradient function module.

 

So in summary: Compared to the original code one has to reduce the number of parameters, use the LOGABSDET and change 1E10 to -1E10. In addition I have to adope the gradient function module. I will post the working code as soon as I have installed the correct gradient function module.

 

Thank you for helping me.

 

Bye, Paul     

Daniel_Paul
Obsidian | Level 7

For the sake of completeness here the working code with gradient function:

 

proc iml;
/*** Write a module that computes the log-likelihood ***/
/** Mahalanobis module from Rick Wicklin (2012) **/
/** Cut and paste the definition of the Mahalanobis module. **/
/** See http://blogs.sas.com/content/iml/2012/02/22/how-to-compute-mahalanobis-distance-in-sas/ **/
start mahalanobis(x, center, cov);
   n = nrow(x);      
   m = nrow(center);
   d2 = j(n,m); 
   U = root(cov, "NoError");
   if all(u=.) then return(-1E10); /* adoption of me: if cov is singular, this will force NLP to try another estimate of cov (see Gu & Yung, 2013: A SAS/IML program using the Kalman filter for estimating state space models) */
   Ur = round(trace(ginv(U)*U)); 
   if Ur < nrow(U) then return(-1E10); /* adoption of me: if U is singular, this will force NLP to try another estimate of cov */ 
   do k = 1 to m;   
     v = x - center[k,];
     z = trisolv(2, U, v`);
     w = trisolv(1, U, z);
     d2[,k] = (v # w`)[,+];
   end;
   return (sqrt(d2));
finish;
start LogLikMVN(param) global(x);
   nmu = ncol(x);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   d = mahalanobis(x, mu, cov);
   if d=-1E10 then return(-1E10); /* if cov is singular, this will force NLP to try another estimate of cov */
   dsd = sum(d#d);
   logdet = logabsdet(cov)[1];
   f =  - (rowx*nmu/2 * log(2*constant("PI"))) - (rowx/2 * logdet) - (1/2 * dsd);
   return( f );
finish;



/*** Write a module that computes the gradient of the log-likelihood function ***/
/** Diagonal module from Rick Wicklin (2013) **/
/** See http://blogs.sas.com/content/iml/2013/10/21/assign-the-diagonal-elements-of-a-matrix.html **/
start SetDiag(A, v);
   diagIdx = do(1,nrow(A)*ncol(A), ncol(A)+1);
   A[diagIdx] = v;             /* set diagonal elements */
finish;
start GradLogLik(param) global (x);
   nmu = ncol(x);
   mu = t(param[1:nmu]);
   cov1 = t(param[(nmu+1):ncol(param)]);
   cov = sqrvech(cov1);
   rowx = nrow(x);
   xdif = x - mu;
   xdifs = t(xdif[+,]);
   invcov=inv(cov);
   dfdmu = t(invcov*xdifs);
   ftwo = j(nmu,nmu,2);
   run SetDiag(ftwo, 1);
   B = j(nmu,nmu,0);
   do i=1 to rowx;
     xred = xdif[i,];
     A = t(xred)*xred;
     B = B + A;
   end;
   dfdcov1a = -rowx/2 * invcov;
   dfdcov1b = 1/2 * invcov*B*invcov;
   dfdcov1c = dfdcov1a + dfdcov1b;
   dfdcov1 = ftwo#dfdcov1c;
   dfdcov = t(vech(dfdcov1));
   return ( dfdmu || dfdcov );
finish;


/*** Call an optimization routine ***/
x = {0 4 2,
     2 2 1,
     1 4 5,
     6 3 3,
     2 1 2,
     1 0 1,
     3 8 0,
     5 1 8,
     9 7 1,
     1 10 12};
con = { .   .   .   0   .   .   0   .   0,  
        .   .   .   .   .   .   .   .   .}; 
p = {1 1 1 7 1 1 10 2 10};
opt = {1,5};
call nlpnra(rc, result, "LogLikMVN", p, opt, con) grd="GradLogLik";
quit;
Rick_SAS
SAS Super FREQ

Hi @Daniel_Paul,

I'm glad you got it working. If you replace the loop

   B = j(nmu,nmu,0);
   do i=1 to rowx;
     xred = xdif[i,];
     A = t(xred)*xred;
     B = B + A;
   end;

with the matrix multiplication

   B = xdif`*xdif;

I suspect you will see a significant performance improvement.

Best wishes!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 8 replies
  • 1606 views
  • 5 likes
  • 2 in conversation