My goal is to define pdf for distributions not included in SAS/STAT. These may have a variable number of defining parameters, so I used varargs with an array in the function definition inside of PROC FCMP. With these functions I want to do some log-likelihood maximization (among other things).
These functions works perfectly fine in the DATA step, but I cannot get PROC NLMIXED to work.
A minimal example would be this:
/* Two functions that are basically the same. */
proc fcmp outlib=work.func.distributions;
function pdf_array(x, param[*]) varargs;
mu = param[1];
sigma = param[2];
return (pdf('NORMAL', x, mu, sigma));
endsub;
function pdf_noarray(x, param1, param2);
mu = param1;
sigma = param2;
return (pdf('NORMAL', x, mu, sigma));
endsub;
run;
/* The array works in the data step. */
data work.obs;
array param [2] (3, 2);
do i = 1 to 100;
x = rand('NORMAL', param[1], param[2]);
pdf = pdf_array(x, param);
output;
end;
run;
Now I want to use this function in PROC NLMIXED for likelihood maximization (I am aware that it would be better to specify the log pdf directly, but this is not the issue here). Below the code for optimization using the two function definitions.
proc nlmixed data=work.obs;
parms mu=2.5 sigma=1.5;
array param [2];
param[1] = mu;
param[2] = sigma;
ll = log(pdf_array(x, param);
model x ~ general(ll);
run;
proc nlmixed data=work.obs;
parms mu=2.5 sigma=1.5;
array param [2];
param[1] = mu;
param[2] = sigma;
ll = log(pdf_noarray('NORMAL', x, param[1], param[2]);
model x ~ general(ll);
run;
The underlying function is the same, since both just call pdf('NORMAL'). The only difference is the use of an array in the function arguments. With the standard optimization method (quasi Newton), there is no optimization done for pdf_array. Using pdf_noarray, everything works just fine.
Using the TRACE option I can see that there are a lot of additional calculations done, which are missing when I use the user-defined function with the array. However, the log-likelihood is evaluated correctly for each observation in the dataset.
Also, using TECH=NMSIMP (Nelder-Mead-Simplex, which relies purely on function evaluation and has no differentiation) actually does the optimization for pdf_noarray and yields a result. It is the exact same result as with pdf_noarray, although I get a warning that the Hessian is not positive definite and there is no computation of the 95% confidence limits.
So my question is: Am I doing something wrong here? Or is this just missing functionality?
This is a very interesting question. I didn't know this, but my experimentation indicates that you are correct that derivatives of PROC FCMP are not computed when the functions use arrays.
I think I can provide answers to two of your questions:
1. Why does PROC NLMIXED complain about computing derivatives when you use TECH=NMSIMP, which doesn't require derivatives? The answer is that, although the NMSIMP optimization does not require derivatives, after the optimization converges then NLMIXED tries to compute standard errors, which requires computing the Hessian matrix. It is this last step that is failing.
2. How can I avoid using derivatives and get the answer?
The answer is to put the FD option on the PROC NLMIXED statement. The FD options tells the procedure to use finite difference derivatives instead of trying to use analytical derivatives.
So I think if you use
PROC NLMIXED TECH=NMSIMP FD ...;
you can solve the problem even if you use the FCMP functions with arrays.
This is a very interesting question. I didn't know this, but my experimentation indicates that you are correct that derivatives of PROC FCMP are not computed when the functions use arrays.
I think I can provide answers to two of your questions:
1. Why does PROC NLMIXED complain about computing derivatives when you use TECH=NMSIMP, which doesn't require derivatives? The answer is that, although the NMSIMP optimization does not require derivatives, after the optimization converges then NLMIXED tries to compute standard errors, which requires computing the Hessian matrix. It is this last step that is failing.
2. How can I avoid using derivatives and get the answer?
The answer is to put the FD option on the PROC NLMIXED statement. The FD options tells the procedure to use finite difference derivatives instead of trying to use analytical derivatives.
So I think if you use
PROC NLMIXED TECH=NMSIMP FD ...;
you can solve the problem even if you use the FCMP functions with arrays.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.