Solved: PROC GENMOD Error in computing inverse link function

mcs · Posted 03-29-2023 09:48 AM

I'm trying to run three related models. The first and third converge, but the second doesn't. I expected the second and third model to be the same. My variables 'amount' and 'expamt' are the product of 'count' and 'expct' times 'dollars' (which is always positive and generally in the thousands).

What's causing the error in the second model?

1
2    proc genmod data=mydata;
3        class x1-x8;
4        model count = x1-x8 / dist=p offset=expct;
5    run;

NOTE: Algorithm converged.
NOTE: The scale parameter was held fixed.
NOTE: PROCEDURE GENMOD used (Total process time):
      real time           0.21 seconds
      cpu time            0.17 seconds


6
7    proc genmod data=mydata;
8        class x1-x8;
9        model amount = x1-x8 / dist=p offset=expamt;
10   run;

NOTE: Non-integer response values have been detected for the Poisson distribution.
WARNING: The specified model did not converge.
ERROR:  Error in computing inverse link function.
NOTE: The scale parameter was held fixed.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE GENMOD used (Total process time):
      real time           0.08 seconds
      cpu time            0.04 seconds

11


12   proc genmod data=mydata;
13       class x1-x8;
14       freq dollars;
15       model count = x1-x8 / dist=p offset=expct;
16   run;

NOTE: Algorithm converged.
NOTE: The scale parameter was held fixed.
NOTE: PROCEDURE GENMOD used (Total process time):
      real time           0.17 seconds
      cpu time            0.15 seconds

StatDave · Posted 03-29-2023 03:25 PM

It's not clear exactly what you mean by a "dollar-weighted" model, but if you want the Poisson model on the COUNT response to use the values of your DOLLAR variable as weights in the maximum likelihood estimation process (that is, multiply each observation's log likelihood contribution by the non-integerized value of DOLLAR), then just change FREQ DOLLAR; to WEIGHT DOLLAR; . Though you should probably first normalize the DOLLAR weights so that they sum to the sample size to avoid overstating significance.

View solution in original post

StatDave · Posted 03-29-2023 11:07 AM

The inverse link function for this model is exponentiation and is applied to X*beta to estimate the Poisson mean, where beta is the vector of parameter estimates at the given maximum likelihood iteration. Probably at some iteration and for some observation, X*beta becomes large enough that exp(x*beta) cannot be computed. And as noted in the log for this model, there are non-integer values in the response variable, AMOUNT. While the maximum likelihood estimation can still be done in this case, the Poisson distribution is a discrete distribution and should have only integer values. If AMOUNT is a continuous variable, you should probably consider a more appropriate distribution - possibly the gamma or inverse gaussian distribution if the response is strictly positive and skewed.

mcs · Posted 03-29-2023 11:21 AM

Thanks for the reply. There are no non-integer values in AMOUNT, though there are in EXPAMT. I want to compare the COUNT and AMOUNT models, so I want to use the same link function for both.

What I don't understand is why the third model runs and the second doesn't. If I understand correctly, the FREQ keyword in the third model is telling SAS to multiply everything by DOLLARS. Wouldn't that end up with the same value of exp(x*beta)?

StatDave · Posted 03-29-2023 12:06 PM

If you check, I think you will find that there are indeed some non-integer values in the AMOUNT variable. The log message indicates that is what is seen in the data. The FREQ statement effectively replicates each observation FLOOR(DOLLARS) times. Internally, the log likelihood contribution of each observation is multiplied by FLOOR(DOLLARS).

mcs · Posted 03-29-2023 02:03 PM

I did check on AMOUNT.

1
2    data check;
3    set mydata end=eof;
4    retain isint;
5    isint = amount - int(amount);
6    if eof then put isint;
7    run;

0
NOTE: There were 12141 observations read from the data set WORK.MYDATA.
NOTE: The data set WORK.CHECK has 12141 observations and 25 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

StatDave · Posted 03-29-2023 02:16 PM

That code only shows whether the last observation is non-integer. Try this:

data nonint; 
  set mydata end=eof; 
  chk=(amount ne int(amount));
  if chk then output;
  nonint+chk; 
  if eof then put "Number nonintegers = " nonint; 
  run;
proc print;
  var y;
  format y 32.16;
  run;

mcs · Posted 03-29-2023 02:48 PM

Thanks again. Below are the first 15 of the "nonintegers". I guess I understand why at least numbers 12 and 15 cause the warning in SAS.

Anyway, I'm not too worried about the warning if I can avoid it by using the third model in place of the second.

Basically, I want a dollar-weighted model. I thought I could do it by hand (multiplying COUNT by DOLLARS to get AMOUNT), but that didn't work. Instead, using the FREQ keyword seems to work. Is that the right way to do it?

Obs amount 
1 3607159.0000000000000000 
2 1360585.0000000000000000 
3 292033.0000000000000000 
4 454095.0000000000000000 
5 1415948.0000000000000000 
6 2668725.0000000000000000 
7 1071775.0000000000000000 
8 223131.0000000000000000 
9 666578.0000000000000000 
10 199082.0000000000000000 
11 1700960.0000000000000000 
12 7458260.9999999900000000 
13 1291928.0000000000000000 
14 3654607.0000000000000000 
15 3798911.9999999900000000

StatDave · Posted 03-29-2023 03:25 PM

It's not clear exactly what you mean by a "dollar-weighted" model, but if you want the Poisson model on the COUNT response to use the values of your DOLLAR variable as weights in the maximum likelihood estimation process (that is, multiply each observation's log likelihood contribution by the non-integerized value of DOLLAR), then just change FREQ DOLLAR; to WEIGHT DOLLAR; . Though you should probably first normalize the DOLLAR weights so that they sum to the sample size to avoid overstating significance.

mcs · Posted 03-30-2023 09:42 AM

I initially thought to use the WEIGHT statement, but I was confused by the documentation, which talks about dividing by the weight instead of multiplying.

The WEIGHT statement identifies a variable in the input data set to be used as the exponential family dispersion parameter weight for each observation. The exponential family dispersion parameter is divided by the WEIGHT variable value for each observation.

PROC GENMOD Error in computing inverse link function

Re: PROC GENMOD Error in computing inverse link function

Re: PROC GENMOD Error in computing inverse link function

Re: PROC GENMOD Error in computing inverse link function

Re: PROC GENMOD Error in computing inverse link function

Re: PROC GENMOD Error in computing inverse link function

Re: PROC GENMOD Error in computing inverse link function

Re: PROC GENMOD Error in computing inverse link function

Re: PROC GENMOD Error in computing inverse link function

Re: PROC GENMOD Error in computing inverse link function

SAS Innovate 2025: Save the Date