Modeling Counts data using PROC genmod + poisson distribution_results interpretation

I am trying to model counts data to rank order risk of accounts going bad by grade and band. data looks like this:

 Grade band year bad rate Total Accs Bad Accs A A-(0-45%) 2016 0.31% 3924 12 A A-(0-45%) 2013 0.20% 51556 103 A A-(0-45%) 2014 0.24% 49918 120 A A-(0-45%) 2015 0.25% 59723 150 A B-(>45-55%) 2016 0.80% 249 2 A B-(>45-55%) 2015 0.22% 3664 8 A B-(>45-55%) 2013 0.32% 3149 10

when I summarize the above data, I see below observed bad rate by grade*band:

 Grade band bad rate Total Accs Bad Accs A A-(0-45%) 0.23% 165121 385 B A-(0-45%) 0.68% 250156 1708 C A-(0-45%) 1.92% 240478 4609 D A-(0-45%) 3.05% 33809 1030 E A-(0-45%) 3.89% 2853 111 F A-(0-45%) 1.52% 7417 113 G A-(0-45%) 3.30% 3026 100

I have used proc genmod with poisson distribution to model the above data to compare if it rank ordering according to the observed results.

proc genmod data=data;

run;

I see the below results:

 Analysis of Maximum likelihood paramter estimates parameter Estimate Grade*band G A-(0-45%) 0 Grade*band D A-(0-45%) 0.5404 Grade*band A A-(0-45%) 0.9426 Grade*band B A-(0-45%) 1.0461 Grade*band C A-(0-45%) 1.2279 Grade*band E A-(0-45%) 17.6989 Grade*band F A-(0-45%) 17.7168

from the results it suggests that a F grade with A-(0-45%) is 17.71 % more likely to go bad compared to other grades?

but from the observed results, I see bad rate is high for grade E, should'nt grade E have higher parameter estimate in genmod ?

or am I modeling wrong vraiable? I feel like I should model for Total Accs/Bad Accs instead of just Bad Accs to consider severity.

when I try to do that as below, its givng me an error:

proc genmod data=DTI;

run;

Any suggestion on how to model for bad accs including severity as well in the model?

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

If your data consist of a count of events and a count of total trials, then the proper syntax is the following to fit the model which is a logistic model for this binomial response.  You can use either LOGISTIC or GENMOD with the same syntax.

proc logistic data=DTI_mod;

run;

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Use offset= option.

proc genmod data=DTI;

run;

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

The offset in @Ksharp's solution should be "log(total_accs)", but otherwise I agree.  It is actually also possible to model the rate directly as you suggest, just, you should weight with the "total_accs"

``````proc genmod data=DTI;
weight total_accs;
run;
``````

where the rate is a variable defined as bad_Accs/total_accs. The "offset"-solution and the "weight-solution" are equivalent (same estimate and standard errors).

But maybe a better solution here is to regard the bad_accs as outcome from a binomial distribution.

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Posted in reply to JacobSimonsen

@Ksharp@JacobSimonsen thanks!

when i try to use @Ksharp solution I am getting below errors.

proc genmod data=DTI;

/*weight accs;*/

run;

(or)

proc genmod data=DTI;

/*weight accs;*/

run;

ERROR: The mean parameter is either invalid or at a limit of its range for some observations.

I have also tried @JacobSimonsen your approach but still not able to relate the results with observed results.

proc genmod data=DTI_new;

/* rate =  bad_accs/total_accs  */

weight total_accs;

run;

observed:

 Grade DTI bad rate Total Accs Bad Accs A A-(0-45%) 0.23% 165121 385 B A-(0-45%) 0.68% 250156 1708 C A-(0-45%) 1.92% 240478 4609 D A-(0-45%) 3.05% 33809 1030 E A-(0-45%) 3.89% 2853 111 F A-(0-45%) 1.52% 7417 113 G A-(0-45%) 3.30% 3026 100

model results:

 Parameter DF Estimate Grade*DTI A A-(0-45%) 1 0.1193 Grade*DTI B A-(0-45%) 1 0.7107 Grade*DTI C A-(0-45%) 1 1.202 Grade*DTI D A-(0-45%) 1 0.5473 Grade*DTI E A-(0-45%) 1 16.0838 Grade*DTI F A-(0-45%) 1 15.1462 Grade*DTI G A-(0-45%) 0 0

Maybe I am not reading it right ( I am trying to relate model estimate to the observed bad rate %), but it doesn't seem to rank order the bad rate by grade*DTI correctly.

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

you can not put "log(accs)" into offset. You have to create a variable in a dataset before the procedure that contain the log values. That variable should be in offset.

The message, "ERROR: The mean parameter is either invalid or at a limit of its range for some observations" can be because there is a level in the interaction term where observation is zero. I dont think its a coding error.

The

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Posted in reply to JacobSimonsen

Again, as I see your data, it looks more as binomial data than Poisson distributed data. Why do you want to use Poisson distribution instead of binomial distribution?

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Posted in reply to JacobSimonsen

Thanks. I have modified my data so I have a 2 level target variable and tried genmod with binomial distribution. It is giving me similar results compared to poisson.

data:

 Grade DTI year bad Accs A1 A-(0-45%) 2013 N 51453 A1 A-(0-45%) 2013 Y 103 A1 A-(0-45%) 2014 N 49798 A1 A-(0-45%) 2014 Y 120 A1 A-(0-45%) 2015 N 59573 A1 A-(0-45%) 2015 Y 150 A1 A-(0-45%) 2016 N 3912

code:

proc genmod data=DTI_mod descending;

/*weight accs;*/

weight accs;

run;

log:

NOTE: PROC GENMOD is modeling the probability that bad='Y'.

WARNING: The negative of the Hessian is not positive definite. The convergence is questionable.

WARNING: The procedure is continuing but the validity of the model fit is questionable.

WARNING: The specified model did not converge.

NOTE: The Pearson chi-square and deviance are not computed since the AGGREGATE option is not specified.

WARNING: Negative of Hessian not positive definite.

NOTE: The scale parameter was held fixed.

NOTE: PROCEDURE GENMOD used (Total process time):

real time 0.15 seconds

cpu time 0.07 seconds

results:

 Parameter DF Estimate Grade*DTI A1 A-(0-45%) 1 0.1193 Grade*DTI A2 A-(0-45%) 1 0.7107 Grade*DTI A3 A-(0-45%) 1 1.202 Grade*DTI D1 A-(0-45%) 1 0.5473 Grade*DTI D2 A-(0-45%) 1 16.6717 Grade*DTI DN A-(0-45%) 1 15.7341 Grade*DTI DS A-(0-45%) 0 0

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

If the ACCS variable contains the count of the number of events and nonevents, then you should use

FREQ accs;

instead of using the WEIGHT statement. Frequencies and weights have different meanings in a regression.

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

@Rick_SAS Thanks for the link its very useful!

I am getting exact same results even after using FREQ accs;

proc genmod data=DTI_mod descending;

freq accs;

run;

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

I think you are done. The error message is because you have some cells with 0. Therefore, it can not estimate all parameters with the data you have, which then cause the warning in the log.

Its right that "freq" should be used instead of weight. The two options results in same parameter estimates, but not always same p-values.

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

If your data consist of a count of events and a count of total trials, then the proper syntax is the following to fit the model which is a logistic model for this binomial response.  You can use either LOGISTIC or GENMOD with the same syntax.

proc logistic data=DTI_mod;

run;

