Solved: Re: Error in Using NLPNRA and PROC IML for Maximum Likelihood Estimati...

EricCai · Posted 05-19-2014 08:30 PM

Dear Community,

I am trying to follow Rick Wicklin's example of using NLPNRA with PROC IML for maximum likelihood estimation. Unfortunately, I am encountering an error with NLPNRA. I am struggling to find the proper documentation for how to use it and what its inputs mean; the documentation page entitled "NLPNRA Call" is not helpful. Thus, I would much appreciate any help that you can provide.

My questions:

1) What are the possible inputs for NLPNRA?

2) The first 2 inputs for NLPNRA are "rc" and "xr". What do they mean?

3) I am getting the error "(execution) Invalid argument to function." in my log. (See below for more details.) What does this error mean? What am I doing incorrectly?

Thanks for your help!

Eric

Here is my script; notice that my log-likelihood is assigned to the variable "f".

dm 'cle log; cle out; cle odsresults;';

dm 'odsresults; clear';

ods html close;

ods html;

ods listing close;

ods listing;

options

noovp

linesize = 105

formdlim = '-'

pageno = min

;

title "Using Newton's Method to Maximize the Log-Likelihood of Parameters in ABO-Blood Type Under Hardy-Weinberg Equilibrium";

proc iml;

start loglik(param) global(x);

p = param[1];

q = param[2];

na = 9123;

nb = 2987;

nab = 1269;

no = 7725;

f = na*log(p**2 + 2*p*(1-p-q)) + nb*log(q**2 + 2*p*(1-p-q)) + nab*log(2*p*q) + no*log((1-p-q)**2);

return(f);

finish;

con = { 0 0,

1 1};

probs = {0.5 0.5};

opt = {1, 4};

call nlpnra(rc, result, "loglik", probs, opt, con);

Here is the error that I get in the log:

283 call nlpnra(rc, result, "loglik", probs, opt, con);

ERROR: (execution) Invalid argument to function.

operation : LOG at line 271 column 90

operands : _TEM1025

_TEM1025 1 row 1 col (numeric)

0

statement : ASSIGN at line 271 column 5

traceback : module LOGLIK at line 271 column 5

ERROR: (execution) Invalid argument to function.

operation : NLPNRA at line 283 column 1

operands : *LIT1019, probs, opt, con

*LIT1019 1 row 1 col (character, size 6)

loglik

probs 1 row 2 cols (numeric)

0.5 0.5

opt 2 rows 1 col (numeric)

1

4

con 2 rows 2 cols (numeric)

0 0

1 1

statement : CALL at line 283 column 1

Rick_SAS · Posted 05-19-2014 09:15 PM

Because all of the NLP functions are similar, they are explained "all together" in the Chapter of the doc titled "Nonlinear Optimization Examples"

1) The possible inputs are explained in the section "Nonlinear Optimization and Related Subroutines"

2) 'rc' stands for 'return code'. It is documented here. 'xr' stands for 'x result.' It is the optimum paramter found by the algorithm.

3) Your error is because the last term (log(1-p-q)) of the liklihood is undefined when p=q=0.5, which is your starting condition. In fact, the term is undefined whenever p+q=1, so you probably want to constrain p+q<1.

View solution in original post

Rick_SAS · Posted 05-19-2014 09:15 PM

Because all of the NLP functions are similar, they are explained "all together" in the Chapter of the doc titled "Nonlinear Optimization Examples"

1) The possible inputs are explained in the section "Nonlinear Optimization and Related Subroutines"

2) 'rc' stands for 'return code'. It is documented here. 'xr' stands for 'x result.' It is the optimum paramter found by the algorithm.

3) Your error is because the last term (log(1-p-q)) of the liklihood is undefined when p=q=0.5, which is your starting condition. In fact, the term is undefined whenever p+q=1, so you probably want to constrain p+q<1.

EricCai · Posted 05-19-2014 10:33 PM

Thanks for your prompt reply, Rick! 1) After reading your comment and thinking really hard, I'm still struggling to translate the inequality p + q < 1 into that constraint matrix. Could you please provide a hint? 2) I re-wrote the objective function into the likelihood (instead of the log-likelihood) so that there are no more logarithms. f = (p**2 + 2*p*(1-p-q))**na + (q**2 + 2*p*(1-p-q))**nb + (2*p*q)**nab + ((1-p-q)**2)**no; I used the original constraint con = { 0 0, 1 1}; However, this method is problematic because of round-off error - each term is zero because a small number is exponentiated by a big number. Is there any way to overcome this while still retaining the likelihood as the objective function? ***I now realize that the log-likelihood is better as the objective function for numerical purposes because of less round-off error.

Rick_SAS · Posted 05-20-2014 11:59 AM

I always advise people to test their objective function before trying to optimize. If you graph your objective function, you discover that the optimal value of your function is p=0.5 and q=epsilon>0. The function is undefined for q=0, so either there is no optimum in the interior or something is wrong with your function.

I see in your comments that you are estimating the probability of alleles by using the Hardy-Weinberg equation. By a happy coincidence, my son's teacher covered this a few weeks ago in his high-school biology class so I was able to look at your liklihood function and discover your error.

The second term is incorrect. The correct equation is

...+ nb*log(q**2 + 2* Q *(1-p-q)) +...

If you change your equation, the NLP function finds an optimum in the interior of the feasible region:

p = 0.288 and q = 0.107.

Rick_SAS · Posted 05-20-2014 12:05 PM

Oops! I forgot to answer your question, which is necessary to obtain the answer:

/* p q op rhs */

con = {0 0 . . , /* min values */

1 1 . . , /* max values */

1 1 -1 1 }; /* 1*p + 1*q <= 1 */

EricCai · Posted 05-20-2014 11:10 PM

Hi Rick!

1) I'm sorry for wasting your time in deciphering what was wrong with my incorrectly coded objective function. I should have caught that and double-checked that before asking for help. Thanks for noticing it for me.

2) Yes, I should have plotted the objective function first. I'm new to PROC IML, and I cannot find an example of a 3-dimensional plot on the web. Could you please point me toward a useful resource to learn how to do it? (I would prefer to try to figure it out on my own first before asking you for the code.)

3) Thanks for sharing the correct constraint matrix with me.

a) What does "op" mean for the name of the 3rd column? (I'm guessing that "rhs" means "right-hand side", denoting the right-hand side of "p + q <= 1".)

b) How did you come up with this constraint matrix? Is there some resource that can teach me how to determine constraint matrices?

c) What would be the constraint matrix for "p + q < 1"? Would there be a difference between the "less than" and "less than or equal to"?

Thanks a lot for your help!

Eric

Rick_SAS · Posted 05-21-2014 09:05 AM

1) No worries. I often ask a colleague to look at a problem that I spent hours on, only to have him point out a mistake that was right before my eyes.

2) Check my blog for a dozen different examples. I prefer the contour plot, so search "contour plot" (include the quotes) at http://blogs.sas.com/content/iml I have examples of ODS plots and plots in SAS/IML Studio.

3) (b) Read the doc chapter "Nonlinear Optimization Examples", which tells you all about how to use the NLP functions. How to define the contraint matrix is in the section "Parameter Constraints". (a) The 'op' column encodes the inequality: -1 means "less than or equal", 0 means "equal to", and +1 means "greater than or equal". (c) In finite-precision arithmetic there is no "strictly less than", but you can define the RHS as (1 - 1e6) or (1 - 1e10).

Error in Using NLPNRA and PROC IML for Maximum Likelihood Estimation

Re: Error in Using NLPNRA and PROC IML for Maximum Likelihood Estimation

Re: Error in Using NLPNRA and PROC IML for Maximum Likelihood Estimation

Re: Error in Using NLPNRA and PROC IML for Maximum Likelihood Estimation

Re: Error in Using NLPNRA and PROC IML for Maximum Likelihood Estimation

Re: Error in Using NLPNRA and PROC IML for Maximum Likelihood Estimation

Re: Error in Using NLPNRA and PROC IML for Maximum Likelihood Estimation

Re: Error in Using NLPNRA and PROC IML for Maximum Likelihood Estimation

Registration is open