Hi all:
I have a question bugging me for a long time.
I self defined two objective functions, namely "betafunction" and "bfunction".
"betafunction" is a function of fixed effect "beta", this function calculates (returns) a number at the end, and we want to obtain an estimate of fixed effect "beta" by minimizing this number. In this function, "b" is a known vector.
The same with "bfunction", which is a function of random effect "b", this function also calculates (returns) a number at the end, and we want to obtain an estimate of random effect "b" by minimizing this number. In this function, "beta" is a known vector.
The way to obtain optimal fixed and random effect estimates is to conduct minimization of these two functions iteratively.
In detail, in "mixed" function, I initial the iteration with a starting value of fixed effect "beta" (obtained from GLM), and setting the starting value of "b" to be zero. Minimize "betafunction", pass final minimized beta to "bfunction" and minimize the "bfunction", obtain the minimized b and pass it to "betafunction" to get a minimized beta, and keep iterating these two minimization until a convergence criteria "L2_Norm(beta_new-beta_old) + L2_Norm(b_new-b_old) < 0.0000001, is satisfied.
Note: "projection" and "BaseM" are two self-defined functions that will be called in this iteration, I check these two function and they are correct.
I fit my code with two simulated datasets:
At first I used simulated data from linear mixed model with random intercept only, and assuming error terms are simulated with compound symmetry correlation structure. The code works fine and no error message are shown. The resulting estimated fixed effects are very close to the ones I used to simulate the dataset. And the random intercepts are not too far from the true ones, and are also very close to the ones obtained from PROC MIXED.
The second simulated dataset is really also from a linear mixed model but with both fixed/random intercept and fixed/random slope, which is "time", assuming independence between random effects, and compound symmetry correlation structure for error terms.
The dataset is simple: a column of ID(with total, say 50 subjects), a column of numeric response(5 repeated measures each subject), and a column of time(ranging from 1 to 5).
My concern and question is when I tested my code using this model, there are some error messages, saying:
"ERROR: NEWRAP Optimization cannot be completed".
The program keeps running even though there are some error messages like this popping out occasionally.
But the result is not good, the fixed effect estimates are basically identical with my inputs, meaning the algorithm did not really work.
One thing about the error: it seems like it works when appying NLPNRA on "bfunction" , but the error message occurs when applying NLPNRA on "betafunction" in each iteration.
I really do not know how exactly the error is caused, and why the program is not working on linear mixed model with random slope but works will random intercept only model.
Any help or advice will be so much appreciated!
THANK YOU IN ADVANCE!
proc iml;
start mixed (Pred_Fixed_Efft, Pred_Ran_Efft, betastart)
global(CorrStruc, n, SampSize, nfe, nre, Pj, lambda1, lambda2, ite, Y, X, Z, beta, b, i);
betavalue = j(nfe, ite); /* matrix to store fixed effect iterations*/
bvalue = j(SampSize*nre, ite); /* matrix to store random effect iterations*/
/*initial value from GLM*/
betavalue[,1] = betastart; /* assign first column of beta matrix to be the initial estimate from GLM */
bvalue[,1] = j(SampSize*nre, 1, 0); /* assign first column of b matrix to be the initial estimate of 0 */
epsilon = 1;
tc = repeat(.,1,13);
tc[1] = 1000;
tc[2] = 1000;
do i=2 to ite until (epsilon < 0.000001) ;
/*print, "beta value pre initialized" beta;*/
beta = betavalue[,i-1]; /* beta value in bfunction is fixed as previous minimization result*/
/*print, "beta value before NR" beta;*/
b0 = bvalue[,i-1]; /* initial b value is as previous minimization result*/
opt = {0 2};
call NLPNRA(rc, xr, "bfunction", b0, opt,,tc);
bvalue[,i] = t(xr); /* assign minimized b value in current iteration into b matrix */
b = bvalue[,i]; /* b value in betafunction is fixed as previous minimization result*/
/*print, "b value before NR" b;*/
beta0= betavalue[,i-1]; /* initial beta value is as previous minimization result*/
opt = {0 2};
call NLPNRA(rc, xr, "betafunction", beta0, opt,,tc);
betavalue[,i] = t(xr); /* assign minimized beta value in current iteration big beta matrix */
epsilon = norm(abs(betavalue[,i]-betavalue[,i-1]), "L2") + norm(abs(bvalue[,i]-bvalue[,i-1]), "L2");
end;
/****************************/
/*1.RANDOM EFFECT */
/*assign random effects estimates to each subject*/
Pred_Ran_Efft = shape(bvalue[,i], SampSize, nre);
/*2. FIXED EFFECT*/
/*assign fixed effect estimates to be Pred_Fixed_Efft */
Pred_Fixed_Efft = betavalue[,i];
finish mixed;
The portion of the optimization that is having a problem is the estimates of the random effects. It looks like you are estimating 100 random parameters. Since your sample is only size 250 it is understandable that objective function might be flat.
Here are a few suggestions:
1.Try quasi-newton (NLPQN), instead of Newton-Raphson.
2. Make the convergence criterion loose at the beginning. You can tighten it up later in the iteration process. For example, try
opt = {0 0};
tc[3:6] = .;
call NLPNRA(rcb, xr, "bfunction", b0, opt,,tc);
bvalue[,i] = t(xr);
b = bvalue[,i];
beta0= betavalue[,i-1];
opt = {0 1};
tc[3:6] = 1e-3; /* <== you can make the criterion tighter as i increases */
call NLPQN(rcbeta, xr, "betafunction", beta0, opt,,tc);
3. Remember that you can use the return code to detect nonconvergence. In the code above I used rcb and rcbeta for the return codes. You can test whether rcbeta < 0, and if so loosen the convergence criterion and rerun to try to obtain convergence. Or abort the computation so you can try again.
Hi Rick:
Thank you so much for your quick response!
Your suggestion works very well, both function converges, the return code is "6" for both function.
I will test with a tighter criteria and increase i and sample size to see how the result changed.
Just one question:
There is a note message shown in each function minimization, I found that some user mentioned that "this note appears in PROC GLIMMIX, and this may lead to very large standard errors for the associated parameter".
So I wonder should I worry about this message, if so, do you have any suggestion?
Note: At least one element of the (projected) gradient is greater than 1e-3.
Thank you very much!
Glad you found the tips helpful.
If you look at the table, you see that the MaxAbsGradElement is 0.007, which is indeed less than 1e-3. But if you look at the ObjFunc change and StepSize columns, you see very small numbers. These numbers indicate that the function is very flat in one direction and can't proceed any closer to the minimum. However, in another direction the function is less flat.
I don't have time for an in-depth analysis right now, but look at Slide 6 at https://goo.gl/images/QbJE3x for an example. You could try a different NLP algorithm, but I suspect the problem is your sample size. The sample determines the curvature/steepness of the log-likelihood function. Small data sets lead to broad LL. The large standard errors are a consequence of the flatness of the LL.
Hi Rick:
The slides you shared are very helpful!
I increase the sample size from 50 to 200, and increase number of repeated measures from 5 to 10, use the same termination criteria "1e-3" on NLPQN.
This time the code runs really slow, which makes sense since the sample size is 8 time of original one, it takes about 45-60 min for one round of minimization of both "bfunction" and "betafunction".
Also when I increase termination criteria to "1e-6" on NLPQN. and set "i" to be extremely large, the error message:
"QUANEW Optimization cannot be completed" show up again.
I use this extreme setting just to see whether quasi-Newton and NRA converge, now it seems like the "1e-3" is a good termination creteria on "NLPQN".
So in conclusion:
1. Is there a way to improve the convergence speed for a large sample, I guess this is related to which NLP algorithm we choose and the natural of the function to minimize?
2. Do you think using "1e-3" is safe on NLPQN so that I can apply quasi-Newton on relatively large sample size, even though it converges relatively slow?
Thank you for your generous help!
1. You can vectorize the functions more efficiently. Replace the DO loops with matrix algebra. Get rid of the concatenation (||) within the loop. Those functions are called hundreds of time for each step, so make them as fast as possible. My blog has many examples of these best practices.
2. Only you can answer that. I don't have time to carefully study your problem.
Hi Rick:
I see, your blog is so much helpful for an IML beginner like me.
Really appreciate your help!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.