BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nstdt
Quartz | Level 8

Hi all,

The following likelihood function L which is an integral needs to be maximized. I think the integral has to be numerically computed.

L = int { k1 * p^d * (1-p)^ (n-d) dy } , from y=0 to y=1, where p is  a function of y, and k1,n,d are constants, k1= n choose d

p(y) = ...

I need to maximize L to obtain MLE estimates for gamma and rho. (Or another option is to fix gamma as a constant too, and estimate rho). Thus gamma and rho are the unknown quantities of interest.

I am having trouble setting up the integral to be computed, and then maximizing it.

Specifically, I am not sure of the syntax  to specify the dependence of the integrand on gamma and rho, in a way that these are identified as the parameters to be optimized.

I read Rick Wicklin's blog Optimizing a function of an integral,

where he uses NLPNRA to maximize an integral. Can NLPNRA be applied  here too? Or does the integral  L require a different treatment?

Here is some IML pseudocode.

proc iml;

/* Define the integrand, which will be called by QUAD subroutine.*/

n = n1 /** n1 is a constant**/

d = d1 /** d1 is a const **/

start Func(gamma,rho)  GLOBAL  n d;

  p = ...;

   f = ...;

  return( f );

finish;

/* Define a function that computes the integral.*/

start Integral(parms);

   /* the parameters are:  gamma = parms[1]; rho = parms[2]; */

   yinterval = 0 || 1;

   call quad(val, "Func", yinterval);    /* evaluate integral on [0, 1] */

   return( val );

finish;

/* Set up linear constraint 0 < rho < 1 */

con = {.  0  ,  /* no constraint on lower bound of gamma, 0 < rho */

       .  1  }  /* no constraint on upper bound of gamma, rho < 1  */

/*Maximimize using nlpnra */

p = {-1 1};        /* initial guess for solution */

opt = {1,          /* find maximum of function   */

       2};         /* print a little bit of output */

call nlpnra(rc, result, "Integral", p, opt) blc=con j;

Thank you for any help and advice.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

BTW, it seems like your problem (as stated) might have a simple solution. For each value of n and d, there is a value of p that  maximizes the integral of the binomial(d,p,n) distribution on [0,1].  Solve the 1-D problem to find the value p=pOpt that maximizes the integral.Then set rho=0 and gamma = quantile("Normal", pOpt).

View solution in original post

7 REPLIES 7
Rick_SAS
SAS Super FREQ

to Here are some thoughts:

1) FUNC needs to be a function of y.   The gamma and rho parameters go into the GLOBAL clause.

2) NORMCDF(...)-->CDF("Normal",...) and INVNORMCDF(...) --> quantile("Normal,...)

3) For some parameter values, p (in FUNC) will be outside the interval [0,1]  because you are dividing the CDF by sqrt(1-rho).

Then you will get an error when you evaluate f.  You have to modify FUNC to handle the case where p is not in [0,1].

[WRONG..IGNORE THIS

And the most important observation is this:

4) The integrand f is the PDF of the binomial distribution. It is equivalent to PDF("Binomial",d,p,n);   Consequently, you do not need to call QUAD to evaluate this integral. The integral of a PDF is the CDF, so you can use CDF("Binomial",...), to evaluate the integral, as shown in this blog post: Optimizing a function that evaluates an integral - The DO Loop

]

Message was edited by: Rick Wicklin IGNORE the last point, which is wrong. I just realilzed that p is a function of y, not a constant.

nstdt
Quartz | Level 8


Thanks for the reply. Your blog is really helpful.

On (1), I am struggling to express FUNC with y as parameter.  What do I do about n and d? How are those passes into FUNC?

Will they also go into the GLOBAL caluse? I somehow need the NLPNRA to recognize that it is gamma and rho that need to be optimized.

Is this right:

start Func(y)  GLOBAL  gamma,rho;

  p = ...;

           /*** Do Something to handle case when p is outside [0,1].... ***/

   f =...;

  return( f );

finish;

Thanks again for any help and advice.

Rick_SAS
SAS Super FREQ

start Func(y)  GLOBAL(gamma,rho,n, d);

  p = ...;

/***Do Something to handle case when p is outside [0,1].... ***/

  f = pdf("Binomial", d, p, n);

  return( f );

finish;

Rick_SAS
SAS Super FREQ

BTW, it seems like your problem (as stated) might have a simple solution. For each value of n and d, there is a value of p that  maximizes the integral of the binomial(d,p,n) distribution on [0,1].  Solve the 1-D problem to find the value p=pOpt that maximizes the integral.Then set rho=0 and gamma = quantile("Normal", pOpt).

nstdt
Quartz | Level 8

Hi again,

Thanks for your replies.  I still have some questions, more on the implementation though.

First, it turns out gamma is a constant, and I only need to optimize for the parameter rho.

In this case, it seems like I can't avoid calling quad. And my confusion is that when I write the Integral function following your blog, then I need to specify rho as the parameter that changes. Howevr, the function Func needs the variable of integration "y" as an input, where y is obtained from the INVNORMCFD and goes from 0 to 1.

How can I simultaneously retain y as the variable in Func but rho as the parameter in Integral ?

Here is my revised code below. My confusion is in Step 2. Should I make it Integral(parm), that is, make the Integral function take an input argument parm which will map to rho, or should rho be outside with the GLOBAL statement a below? Does Intergal need an input argument here at all?

I am confused about how to differentiate between y and rho.

I think that NLPNRA will call Integral with parameter rho, and Integral executes QUAD. Then QUAD calls FUNC. But how do I tell QUAD that  though y is the input to FUNC, rho is the parameter? In defining Integral, should I use: start Integral GLOBAL(rho);

or should it be: start Integral(y) GLOBAL(rho) ?

proc iml;

/* Step 1. */
/* Define the integrand, which will be called by QUAD subroutine.*/
n = N             /*** N is a constant. ***/
d = D             /*** d is a constant. ***/

gamma = k    /*** gamma is a constant. ***/

start Func(y)  GLOBAL(gamma,rho,n, d);

  p = CDF("Normal",(gamma - {sqrt(rho)*quantile("Normal", y)} ) / sqrt(1-rho) );
  /***Cap and Floor to handle case when p is outside [0,1].... ***/
           If p > 1 then p =1;
           If p < = then p =0.01;

  f = pdf("Binomial", d, p, n);
  return( f );

finish;

/* Step 2. */

/* Define a function that computes the integral.*/


start Integral GLOBAL(rho); /**** should this be Integral(y) GLOBAL(rho) ????????? ***/

   /* the optimization parameter is:   rho */
yinterval = 0 || 1;
call quad(val, "Func", yinterval); /* evaluate integral on [0, 1] */
return( val );
finish;


/* Set up linear constraint 0 < rho < 1 */
con = {0 ,1}

/*Maximimize using nlpnra */
rho_0 = {0.5}; /* initial guess for solution */
opt = {1,2} /* find maximum of function */

call nlpnra(rc, result, "Integral", rho_0, opt) blc=con ;

Thanks again very much.


nstdt
Quartz | Level 8

To add a further layer of complexity, in the full problem, n and d are both arrays of length say, 100. And I really need to maximize the Sum of log of the Integrals of Func(the sum is over each of the 100 n and d pairs, rather than just one n1 and d1 as I have used here).

If you have any further hints, please do let me know, I really appreciate it.

Thanks again very much!

Rick_SAS
SAS Super FREQ

It is a little confusing because there are two functions: one for the integral and one for the NLP function. Try this:

start Func(y)  GLOBAL(gamma,rho,n, d);
  p = ...;

  f = pdf("Binomial", d, p, n);
  return( f );
finish;

/* Define a function that computes the integral.*/
start Integral(parm)  GLOBAL(rho);
   rho = parm[1];
   yinterval = 0 || 1;
   call quad(val, "Func", yinterval);    /* evaluate integral on [0, 1] */
   return( val );
finish;

In 'Func', y is the "dummy variable" that is used for integration. The QUAD function will evaluate 'Func' at many points in order to compute the integral of Func(y) dy on [0,1]. 

In 'Integral', the NLP routine will optimize the parm variable. At each iteration, the global variable rho is updated with the value of parm before the QUAD routine is called. This syntax should help you keep things straight.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 7 replies
  • 1415 views
  • 0 likes
  • 2 in conversation