BookmarkSubscribeRSS Feed
renc
Calcite | Level 5

A matrix(V) is supposed to be symmetric within EM-algorithm, but after some iteration it is not symmetric. At each iteration, I used V=(V+V`)/2 to force it to be symmetric. Is it reasonable?

What else can I do?

Any suggestion will be helpful. Thanks you help in advance.

5 REPLIES 5
Rick_SAS
SAS Super FREQ

It seems like the question is WHY the matrix is losing symmetry.

I don't know the EM algorithm well enough to offer advice on what might be going wrong, but here's what I'd do:

1) Run your implementation of the the EM algorithm on a simple problem for which you know the answer. Wikipedia has an example for estimating parameters for a mixture of multivariate normal distributions (use bivaraite).

2) Find out where/how the matrix is losing symmetry. Is there an updating step such as NewV = V + Delta? If so, is Delta symmetric?  I use the following function to test for symmetry. You can modify it to find out which components are not symmetric.

   /* finite-precision test of whether a matrix is symmetric */

   start SymCheck(A);

      B = (A + A`)/2;

      scale = max(abs(A));

      delta = scale * constant("SQRTMACEPS");

      return( all( abs(B-A)< delta ) );

   finish;

renc
Calcite | Level 5

I like your idea about finite-precision test of whether a matrix is symmetric. The formula for the matrix is A=B*B`+C-D*inv(E)*D`, in which C and D are symmetric. After some iteration at which the matrix is slightly asymmetric like A(1,10)=0.1188091, but A(10,1)=0.11880909. But later on it is potentially asymmetric and the algorithm does not converge

One more question, I used EM-algorithm in IML, but the convergence is extremely slow. I submitted it on last Friday, the program is still running. Someone else suggested me to use R/C++ to see if it convergences faster. I think the slow convergence is due to EM-algorithm rather than the software. What is you opinion about that?

Thanks a lot.

Rick_SAS
SAS Super FREQ

In 15 years I've never had the need to recode a well-written IML algorithm in C. (And certainly never in R, which tends to be slower than IML.)  The time that you would spend rewriting the code would be much better spent profiling the code to find out where it is running slowly, and then optimizing those portions of the code.

If you have my book, Statistical Programming with SAS/IML Software, Chapter 15 describes how to profile code and optimize it for efficiency.

If you don't, you might be able to gain a few tips from this blog post: Eight tips to make your simulation run faster - The DO Loop

Regardless, it sounds like your matrix E might be ill-conditioned. Try to fix the symmetric problem by defining Z = D*inv(E)*D`, and then subtracting (Z+Z`)/2 instead of Z when you are forming A.   You might also try Z = D*solve(E,D`), which is usually a more stable way to compute the same quantity.

renc
Calcite | Level 5

Oh, something is wrong in the previous post. E should be symmetric instead of D. E is a variance-covariance matrix which should be updated at each iteration. Yes, you are right. I will try according to what you have said.

I have one more question about fisher information matrix. In addition to estimate the parameters in my model, I also need to calculate their standard deviations at convergence. We know the diagonal elements in the inverse of  the fisher information matrix are the variances of parameters. Is it possible that we may get a negative estimated variance for one parameter?

Thanks for your help and your timely response!!

Rick_SAS
SAS Super FREQ

Even better: If E is symmetric you can solve the linear equations more efficiently!

Yes, there are times when you get negative estimates for the variance (?!?!).  There are many reasons that this can occur. It is usually a signal that your model is misspecified, although Gary King gives other reasons: http://gking.harvard.edu/files/help.pdf I like Figure 1 of the paper, which show WHY the covariance matrix misbehaves.

There is also a short, easy, article about how this can arise in mixed models: When the Hessian Matrix goes Wacky

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 5 replies
  • 1393 views
  • 4 likes
  • 2 in conversation