Jumping over an NLPNRA hurdle...

johan_ferreira · Posted 03-25-2014 01:16 PM

Hi there,

I am trying to fit a dataset to a self-derived distribution - trying to use nlpnra but no luck thusfar. I think it is mainly a programming fault on how I program the likelihood, since the call runs, but does not seem to identify the function it should maximize. The output that I get from the call simply says that the convergence criteria has been satisfied, but no iterations have been performed. The initial values that I give the call is returned.

Here is the density of the model:

Because of the infinite summation, the likelihood function is obviously just the product of this density without any further simplification

My approach is direct - by taking the product of the density across my data points, and by using do - loops for the infinite sums. Suppose for now that just some threshold can be substituted for infinity; in this case, 30. My code is given below:

proc iml;

use sasuser.nebraskdry1;

read all into data1;

data1 = data1[3:nrow(data1),ncol(data1)-2:ncol(data1)];

x1 = data1[,1];

x2 = data1[,2];

*y = x1 || x2;

w2 = x1/x2;

n = nrow(w2);

thres = 30;

start logl(par) global(w2,thres,n);

eksi = par[1];

theta = par[2];

dens = j(n,1,0);

do index = 1 to n;

val = 0;

do i = 0 to thres;

do k = 0 to thres;

negb = (1-eksi**2)**(n/2) * eksi**(2*i) * gamma(n/2 + i) / (gamma(n/2)*fact(i));

poiss = exp(-theta/2) * (theta/2)**k / fact(k) ;

betav = beta(n/2+i+k,n/2+i+k);

betav = 1/betav;

variab = w2[index,1]**(n/2+i+k-1) * (1 + w2[index,1])**(-1*(n+2*i+2*k));

val = val + negb*betav*variab*poiss;

end; *k;

end; *i;

dens[index,1] = val;

end; *index;

v = dens[#,];

return(v);

finish logl;

par={0.2 15};

optn = {1 1};

call nlpnra(rc,xr,"logl",par,optn);

xr = n // xr`;

row = {"n" "eksi hat" "theta hat"};

if rc>0 then print xr[r=row label='ML Estimates for model:']; else print 'Algorithm does not converge';

run;

quit;

Can anyone possibly assist me in overcoming this obstacle? I'd really, really appreciate it.

Kindest regards

Johan

Rick_SAS · Posted 03-26-2014 07:49 PM

Without your data, we can't run your program.

Did you check the SAS Log? Is there an error or warning? It seems like your "infinite product" has a high probability of overflowing during the optimization.

I assume that you can run

z = logl(par);

and it works without error?

Can any of your computations be simplified by using the PDF functions that SAS supplies? I recognize the beta distribution, and it looks like there is some neg.bin and poisson distributions in there as well?

johan_ferreira · Posted 03-31-2014 09:49 AM

Rick, thanks for taking time to reply and look at it. Your comments are greatly appreciated. I took some time to consider your comments.

First off, here is my code again, and I attach the data now in a data step:

data d;

input w2 ;

cards;

0.086206897

11

1

2

4

1

0.5

0.153846154

0.246575342

0.0125

0.5

0.15625

0.033333333

0.4

0.192982456

11

1

2

13

5.166666667

1

0.333333333

1

9

0.153846154

0.037037037

1

0.333333333

0.181818182

1.5

0.133333333

2

1

2

3

37

0.076923077

0.666666667

14

1

1.666666667

0.142857143

1.666666667

20

0.272727273

1.5

0.222222222

0.7

0.25

0.5

1

4

0.454545455

1

0.578947368

3

10

1

5

2

1

1.5

0.2

2

0.242424242

0.25

1.066666667

8

4

2

0.166666667

0.454545455

0.046511628

0.5

0.2

2

3

34

0.25

;

run;

proc iml;

use d; read all into w2;

n = nrow(w2);

thres = 20;

start logl(par) global(w2,thres,n);

eksi = par[1];

theta1 = par[2];

theta2 = par[3];

dens = j(n,1,0);

/*eksisq = eksi**2;

nover2 = n/2;

poissonparameter1 = theta1/2;

poissonparameter2 = theta2/2; */

do index = 1 to n;

val = 0;

w2val = w2[index,1];

do i = 0 to thres;

do k1 = 0 to thres;

do k2 = 0 to thres;

negb = (1-eksi**2)**(n/2) * eksi**(2*i) * gamma(n/2 + i) / (gamma(n/2)*fact(i));

*negb = pdf("negbinomial",i,eksisq,nover2);

poiss = exp(-theta1/2) * (theta1/2)**k1 / fact(k1) * exp(-theta2/2) * (theta2/2)**k2 / fact(k2) ;

*poiss = pdf("poisson",k2,poissonparameter1) * pdf("poisson",k2,poissonparameter2);

betav = 1/beta(n/2+i+k1+k2,n/2+i+k1+k2);

variab = w2val**(n/2+i+k1+k2-1) * (1 + w2val)**(-1*(n+2*i+k1+k2));

val = val + negb*betav*variab*poiss;

end; *k2;

end; *k1;

end; *i;

dens[index,1] = val;

end; *index;

v = dens[#,];

return(v);

finish logl;

*Calculate ML estimates;

par={0.2 5 5};

z = logl(par);

optn = {1 2};

call nlpnms(rc,xr,"logl",par,optn);

z = logl(par);

print z;

xr = n // xr`;

row = {"n" "eksi hat" "theta1 hat" "theta2 hat"};

if rc>0 then print xr[r=row label='ML Estimates for model:']; else print 'Algorithm does not converge';

run;

quit;

I also added another Poisson factor to my density.

I spoke to a colleague regarding my use of NLPNRA and he suggested I also give NLPNMS a try since the latter does not rely so heavily on the second derivatives (i.e. Hessian matrix) being continuous. Due to the nature of the model and the "brute force" attack, I tried it, and (sometimes...) it is successful. Like you mentioned, the overflow is indeed reached quickly (which is mentioned in the log, and then the call does not execute further), and then I adjusted the "threshold" to a value such that the model will still give results, even though it is not very reliable. I am mostly now aiming to just make sure my IML approach is correct and will consider the reliability of the estimates at a later stage.

I can run z=logl(par), although it returns a value of 0 - why, I am not sure.

As per your suggestion, I used built-in functions of the PDFs of SAS in this new attempt. I added it here and is commented out for easy use when required. A possible challenge in this regard is that the call seems to optimize the parameter "eksi" sometimes to be larger than 1, that results in the pdf function of the negative binomial to be undefined. I kept the written out density in the code just in case.

If you have any additional comments or suggestions I would greatly appreciate it.

Johan

Rick_SAS · Posted 04-01-2014 11:18 AM

If "eksi" can't be greater than 1, you should constrain it.
The expression (1-eksi**2)**(n/2) is undefined when eksi > 1 and n is odd.

My comment is that you should be very concerned about convergence. Computing an infinite summation by summing the first 20 terms is only going to work if the series converges very quickly. There are many examples in numerical analysis that show why truncating an infinite series can lead to large errors. Along the same lines, using terms that involve tha GAMMA and FACT functions often lead to numerical errors, which is why I prefer the PDF calls..

Another comment is that you are not taking advantage of the vector nature of SAS/IML. Perhaps you can compute with w2, rather than have a DO loop over the index.

Lastly, you can save some computational time if you list some of the statements out of the innermost loops. For example, the expression for 'negb' depends only on 'i', so you do not need to compute it at every iteration of k1 and k2.

Jumping over an NLPNRA hurdle...

Re: Jumping over an NLPNRA hurdle...

Re: Jumping over an NLPNRA hurdle...

Re: Jumping over an NLPNRA hurdle...

Jumping over an NLPNRA hurdle...

Re: Jumping over an NLPNRA hurdle...

Re: Jumping over an NLPNRA hurdle...

Re: Jumping over an NLPNRA hurdle...

Ready to join fellow brilliant minds for the SAS Hackathon?