BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mcs
Obsidian | Level 7 mcs
Obsidian | Level 7

I'm trying to run three related models.  The first and third converge, but the second doesn't.  I expected the second and third model to be the same.  My variables 'amount' and 'expamt' are the product of 'count' and 'expct' times 'dollars' (which is always positive and generally in the thousands).  

 

What's causing the error in the second model?

 

1
2    proc genmod data=mydata;
3        class x1-x8;
4        model count = x1-x8 / dist=p offset=expct;
5    run;

NOTE: Algorithm converged.
NOTE: The scale parameter was held fixed.
NOTE: PROCEDURE GENMOD used (Total process time):
      real time           0.21 seconds
      cpu time            0.17 seconds


6
7    proc genmod data=mydata;
8        class x1-x8;
9        model amount = x1-x8 / dist=p offset=expamt;
10   run;

NOTE: Non-integer response values have been detected for the Poisson distribution.
WARNING: The specified model did not converge.
ERROR:  Error in computing inverse link function.
NOTE: The scale parameter was held fixed.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE GENMOD used (Total process time):
      real time           0.08 seconds
      cpu time            0.04 seconds

11


12   proc genmod data=mydata;
13       class x1-x8;
14       freq dollars;
15       model count = x1-x8 / dist=p offset=expct;
16   run;

NOTE: Algorithm converged.
NOTE: The scale parameter was held fixed.
NOTE: PROCEDURE GENMOD used (Total process time):
      real time           0.17 seconds
      cpu time            0.15 seconds

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ
It's not clear exactly what you mean by a "dollar-weighted" model, but if you want the Poisson model on the COUNT response to use the values of your DOLLAR variable as weights in the maximum likelihood estimation process (that is, multiply each observation's log likelihood contribution by the non-integerized value of DOLLAR), then just change FREQ DOLLAR; to WEIGHT DOLLAR; . Though you should probably first normalize the DOLLAR weights so that they sum to the sample size to avoid overstating significance.

View solution in original post

8 REPLIES 8
StatDave
SAS Super FREQ
The inverse link function for this model is exponentiation and is applied to X*beta to estimate the Poisson mean, where beta is the vector of parameter estimates at the given maximum likelihood iteration. Probably at some iteration and for some observation, X*beta becomes large enough that exp(x*beta) cannot be computed. And as noted in the log for this model, there are non-integer values in the response variable, AMOUNT. While the maximum likelihood estimation can still be done in this case, the Poisson distribution is a discrete distribution and should have only integer values. If AMOUNT is a continuous variable, you should probably consider a more appropriate distribution - possibly the gamma or inverse gaussian distribution if the response is strictly positive and skewed.
mcs
Obsidian | Level 7 mcs
Obsidian | Level 7

Thanks for the reply.  There are no non-integer values in AMOUNT, though there are in EXPAMT.  I want to compare the COUNT and AMOUNT models, so I want to use the same link function for both.

 

What I don't understand is why the third model runs and the second doesn't.  If I understand correctly, the FREQ keyword in the third model is telling SAS to multiply everything by DOLLARS.  Wouldn't that end up with the same value of exp(x*beta)?

StatDave
SAS Super FREQ
If you check, I think you will find that there are indeed some non-integer values in the AMOUNT variable. The log message indicates that is what is seen in the data. The FREQ statement effectively replicates each observation FLOOR(DOLLARS) times. Internally, the log likelihood contribution of each observation is multiplied by FLOOR(DOLLARS).
mcs
Obsidian | Level 7 mcs
Obsidian | Level 7

I did check on AMOUNT.

 

1
2    data check;
3    set mydata end=eof;
4    retain isint;
5    isint = amount - int(amount);
6    if eof then put isint;
7    run;

0
NOTE: There were 12141 observations read from the data set WORK.MYDATA.
NOTE: The data set WORK.CHECK has 12141 observations and 25 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
StatDave
SAS Super FREQ

That code only shows whether the last observation is non-integer. Try this: 

data nonint; 
  set mydata end=eof; 
  chk=(amount ne int(amount));
  if chk then output;
  nonint+chk; 
  if eof then put "Number nonintegers = " nonint; 
  run;
proc print;
  var y;
  format y 32.16;
  run;
mcs
Obsidian | Level 7 mcs
Obsidian | Level 7

Thanks again.  Below are the first 15 of the "nonintegers".  I guess I understand why at least numbers 12 and 15 cause the warning in SAS.

 

Anyway, I'm not too worried about the warning if I can avoid it by using the third model in place of the second.

 

Basically, I want a dollar-weighted model.  I thought I could do it by hand (multiplying COUNT by DOLLARS to get AMOUNT), but that didn't work.  Instead, using the FREQ keyword seems to work.  Is that the right way to do it?

 

Obs amount 
1 3607159.0000000000000000 
2 1360585.0000000000000000 
3 292033.0000000000000000 
4 454095.0000000000000000 
5 1415948.0000000000000000 
6 2668725.0000000000000000 
7 1071775.0000000000000000 
8 223131.0000000000000000 
9 666578.0000000000000000 
10 199082.0000000000000000 
11 1700960.0000000000000000 
12 7458260.9999999900000000 
13 1291928.0000000000000000 
14 3654607.0000000000000000 
15 3798911.9999999900000000 
StatDave
SAS Super FREQ
It's not clear exactly what you mean by a "dollar-weighted" model, but if you want the Poisson model on the COUNT response to use the values of your DOLLAR variable as weights in the maximum likelihood estimation process (that is, multiply each observation's log likelihood contribution by the non-integerized value of DOLLAR), then just change FREQ DOLLAR; to WEIGHT DOLLAR; . Though you should probably first normalize the DOLLAR weights so that they sum to the sample size to avoid overstating significance.
mcs
Obsidian | Level 7 mcs
Obsidian | Level 7

I initially thought to use the WEIGHT statement, but I was confused by the documentation, which talks about dividing by the weight instead of multiplying.

 

The WEIGHT statement identifies a variable in the input data set to be used as the exponential family dispersion parameter weight for each observation. The exponential family dispersion parameter is divided by the WEIGHT variable value for each observation.


 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1042 views
  • 3 likes
  • 2 in conversation