12-01-2011 11:05 AM
I have the likelihood functions prewritten. I try to optimize the likelihood through "nlpnrr" function. For my estimated parameters, I then want to find the Hessian computed at those values through 'nlpfdd' function. Once I get the Hessian I am interested in taking the inverse of the hessian multiplied by minus, which is basically the Information Matrix. This Information matrix is the Variance-Covariance Matrix and the On-Diagonal elements give us the variance (Square root of it gives the standard error of the estimates). Unfortunatly, the some of the var-cov matrix on-diagonal elements are negative which does not makes sense as the var-cov has to be positive. Any help would be great, or any alternative way to compute standard errors.
12-01-2011 11:45 AM
Often this is a sign of a misspecified model, but it happens often enough that people have written papers or blogs about various ways to overcome this problem. Some people add a multiple of the identity matrix to the Hessian formula and call it "ridged optimization." (See
the Newton-Raphson Ridge Optimization (NLPNRR) method: http://support.sas.com/documentation/cdl/en/imlug/64248/HTML/default/viewer.htm#imlug_langref_sect21...)
Since you are using NLPNRR already, I wonder if you are adding the final value of the ridge parameter before you try to invert? The NLPFDD function will compute the unadjusted Hessian. To get the ridged Hessian, you need to add lamda*Identity, where lambda is the final value of the ridge parameter.
For a general explanation of this problem (and what people try to do), see Gill and King (2004), "What to Do When Your Hessian is Not Invertible" http://gking.harvard.edu/files/help.pdf
A similar article is http://gking.harvard.edu/files/numhess.pdf
But I emphasize that this is often caused by a misspecified problem. For example, it could occur if you are trying to fit parameters of a gamma distribution to data that are not actually distrubted as gamma.
12-01-2011 02:26 PM
Unfortunately, I don't. Look at the iteration history. Does it look like the iterations are going to infinity or converging to a local maximum that is far from your initial guess? It's possible that the ML function has multiple local maximums, and that some are "bad" whereas others are "good." I wish I had a small (~20) data set and objective function that exhibits this problem. I've never actually observed it myself.
12-01-2011 03:14 PM
I'm reading the user guide of SAS/IML and the standard error of estimated parameters in the example on page 379 is calculated as the square root of absolute value of on-diagonal elements of inverse of Heissan. My problem could be avioded is I do the same way but can we do so?
stderr = sqrt(abs(vecdiag(hin2))) where hin2 is the inverse of heissan
12-01-2011 03:31 PM
Personally, I wouldn't do that.
I would guess that the ABS is in there to prevent the SQRT from complaining if there is a very small variance (such as 1e-10) that, because of numerical rounding error, comes out as slightly negative (such as -1e-12). Values like that are "numerically zero" and so there is no harm in using the ABS function.
If your diagonal elements are sizeable, however (such as -0.5), then I think you're making a mistake if you take the absolute value.
12-02-2011 10:58 AM
Yes. Look into it and see whether it's possible. I don't use ridge regression myself because I question its validity, but if you choose to use it, that how to get the Hession used by the optimization procedure at the final (optimal) location.
12-02-2011 05:09 PM
I ran my model via proc mixed and the code is as follows:
proc mixed data=temp method=ml;
class treat time id;
model var1=treat time time*treat/s;
repeated time /type=ar(1) subject=id R rcorr;
I also ran my model via proc iml by optimizing the likelihood through "nlpnrr"/"nlpnar" function and finding the Hessian computed at estimated parameters through 'nlpfdd' function.
Ideally, this two methods should give me the same results because PROC MIXED also minimizes -2 times loglikelihood by using a ridge-stabilized Newton-Raphson algorithm. In fact, the parameter estimates are pretty close (the difference could be neglected). However, the variance of the parameter estimates are different. Like what you've suggested, I add a multiple (the heading ridge from the optimization procedure) of the identity matrix to the Hessian formula, but it doesn't help, even make it worse.
This bothers me a while and I couldn't figure out why this happens. Do you have any idea?
12-03-2011 06:45 AM
I think this question is going beyond what I know. I am not well-versed in mixed models.
If I had to guess, I would guess that you are using different models. The MIXED model includes a parameter 'rho' that is associated with the AR(1) correlation structure and models the covariance structure as part of the parameter estimates. Unless you are doing the same thing in your IML code, you are solving a different model (perhaps a fixed-effect model).