Solved: Re: Modeling Counts data using PROC genmod + poisson distribution_resu...

sivakoya · Posted 06-28-2017 11:33 AM

I am trying to model counts data to rank order risk of accounts going bad by grade and band. data looks like this:

Grade	band	year	bad rate	Total Accs	Bad Accs
A	A-(0-45%)	2016	0.31%	3924	12
A	A-(0-45%)	2013	0.20%	51556	103
A	A-(0-45%)	2014	0.24%	49918	120
A	A-(0-45%)	2015	0.25%	59723	150
A	B-(>45-55%)	2016	0.80%	249	2
A	B-(>45-55%)	2015	0.22%	3664	8
A	B-(>45-55%)	2013	0.32%	3149	10

when I summarize the above data, I see below observed bad rate by grade*band:

Grade	band	bad rate	Total Accs	Bad Accs
A	A-(0-45%)	0.23%	165121	385
B	A-(0-45%)	0.68%	250156	1708
C	A-(0-45%)	1.92%	240478	4609
D	A-(0-45%)	3.05%	33809	1030
E	A-(0-45%)	3.89%	2853	111
F	A-(0-45%)	1.52%	7417	113
G	A-(0-45%)	3.30%	3026	100

I have used proc genmod with poisson distribution to model the above data to compare if it rank ordering according to the observed results.

proc genmod data=data;

class grade band;

model bad_Accs = Grade bands grade*DTI/ dist=poisson link=log;

run;

I see the below results:

Analysis of Maximum likelihood paramter estimates
parameter			Estimate
*Gradeband**	G	A-(0-45%)	0
*Gradeband**	D	A-(0-45%)	0.5404
*Gradeband**	A	A-(0-45%)	0.9426
*Gradeband**	B	A-(0-45%)	1.0461
*Gradeband**	C	A-(0-45%)	1.2279
*Gradeband**	E	A-(0-45%)	17.6989
*Gradeband**	F	A-(0-45%)	17.7168

from the results it suggests that a F grade with A-(0-45%) is 17.71 % more likely to go bad compared to other grades?

but from the observed results, I see bad rate is high for grade E, should'nt grade E have higher parameter estimate in genmod ?

or am I modeling wrong vraiable? I feel like I should model for Total Accs/Bad Accs instead of just Bad Accs to consider severity.

when I try to do that as below, its givng me an error:

proc genmod data=DTI;

class grade DTI;

model (Accs/bad_Accs)*100 = Grade DTI grade*DTI/ dist=poisson link=log;

run;

19 model (Accs/bad_Accs)*100 = Grade DTI grade*DTI/ dist=poisson link=log;

_

22

76

ERROR 22-322: Syntax error, expecting one of the following: a name, ','.

ERROR 76-322: Syntax error, statement will be ignored.

Any suggestion on how to model for bad accs including severity as well in the model?

StatDave · Posted 06-30-2017 01:32 PM

If your data consist of a count of events and a count of total trials, then the proper syntax is the following to fit the model which is a logistic model for this binomial response. You can use either LOGISTIC or GENMOD with the same syntax.

proc logistic data=DTI_mod;

class grade DTI;

model bad_accs/total_accs = Grade DTI grade*DTI;

run;

View solution in original post

Ksharp · Posted 06-29-2017 09:33 AM

Use offset= option.

proc genmod data=DTI;

class grade DTI;

model bad_Accs= Grade DTI grade*DTI/ dist=poisson link=log offset=total_accs;

run;

JacobSimonsen · Posted 06-29-2017 10:14 AM

The offset in @Ksharp's solution should be "log(total_accs)", but otherwise I agree. It is actually also possible to model the rate directly as you suggest, just, you should weight with the "total_accs"

proc genmod data=DTI;
class grade DTI;
model rate= Grade DTI grade*DTI/ dist=poisson link=log;
weight total_accs;
run;

where the rate is a variable defined as bad_Accs/total_accs. The "offset"-solution and the "weight-solution" are equivalent (same estimate and standard errors).

But maybe a better solution here is to regard the bad_accs as outcome from a binomial distribution.

sivakoya · Posted 06-29-2017 02:21 PM

@Ksharp @JacobSimonsen thanks!

when i try to use @Ksharp solution I am getting below errors.

proc genmod data=DTI;

class grade DTI;

/*weight accs;*/

model bad_Accs = Grade DTI grade*DTI/ dist=poisson link=log offset=log(accs);

run;

ERROR: Variable LOG not found.

(or)

proc genmod data=DTI;

class grade DTI;

/*weight accs;*/

model bad_Accs = Grade DTI grade*DTI/ dist=poisson link=log offset=accs;

run;

ERROR: The mean parameter is either invalid or at a limit of its range for some observations.

I have also tried @JacobSimonsen your approach but still not able to relate the results with observed results.

proc genmod data=DTI_new;

class grade DTI;

/* rate = bad_accs/total_accs */

model rate = Grade DTI grade*DTI/ dist=poisson link=log;

weight total_accs;

run;

observed:

Grade	DTI	bad rate	Total Accs	Bad Accs
A	A-(0-45%)	0.23%	165121	385
B	A-(0-45%)	0.68%	250156	1708
C	A-(0-45%)	1.92%	240478	4609
D	A-(0-45%)	3.05%	33809	1030
E	A-(0-45%)	3.89%	2853	111
F	A-(0-45%)	1.52%	7417	113
G	A-(0-45%)	3.30%	3026	100

model results:

Parameter			DF	Estimate
*GradeDTI**	A	A-(0-45%)	1	0.1193
*GradeDTI**	B	A-(0-45%)	1	0.7107
*GradeDTI**	C	A-(0-45%)	1	1.202
*GradeDTI**	D	A-(0-45%)	1	0.5473
*GradeDTI**	E	A-(0-45%)	1	16.0838
*GradeDTI**	F	A-(0-45%)	1	15.1462
*GradeDTI**	G	A-(0-45%)	0	0

Maybe I am not reading it right ( I am trying to relate model estimate to the observed bad rate %), but it doesn't seem to rank order the bad rate by grade*DTI correctly.

JacobSimonsen · Posted 06-30-2017 02:46 AM

you can not put "log(accs)" into offset. You have to create a variable in a dataset before the procedure that contain the log values. That variable should be in offset.

The message, "ERROR: The mean parameter is either invalid or at a limit of its range for some observations" can be because there is a level in the interaction term where observation is zero. I dont think its a coding error.

The

JacobSimonsen · Posted 06-30-2017 03:38 AM

Again, as I see your data, it looks more as binomial data than Poisson distributed data. Why do you want to use Poisson distribution instead of binomial distribution?

sivakoya · Posted 06-30-2017 08:57 AM

Thanks. I have modified my data so I have a 2 level target variable and tried genmod with binomial distribution. It is giving me similar results compared to poisson.

data:

Grade	DTI	year	bad	Accs
A1	A-(0-45%)	2013	N	51453
A1	A-(0-45%)	2013	Y	103
A1	A-(0-45%)	2014	N	49798
A1	A-(0-45%)	2014	Y	120
A1	A-(0-45%)	2015	N	59573
A1	A-(0-45%)	2015	Y	150
A1	A-(0-45%)	2016	N	3912

code:

proc genmod data=DTI_mod descending;

class grade DTI;

/*weight accs;*/

model bad = Grade DTI grade*DTI/ dist=binomial link=log;

weight accs;

run;

log:

NOTE: PROC GENMOD is modeling the probability that bad='Y'.

WARNING: The negative of the Hessian is not positive definite. The convergence is questionable.

WARNING: The procedure is continuing but the validity of the model fit is questionable.

WARNING: The specified model did not converge.

NOTE: The Pearson chi-square and deviance are not computed since the AGGREGATE option is not specified.

WARNING: Negative of Hessian not positive definite.

NOTE: The scale parameter was held fixed.

NOTE: PROCEDURE GENMOD used (Total process time):

real time 0.15 seconds

cpu time 0.07 seconds

results:

Parameter			DF	Estimate
*GradeDTI**	A1	A-(0-45%)	1	0.1193
*GradeDTI**	A2	A-(0-45%)	1	0.7107
*GradeDTI**	A3	A-(0-45%)	1	1.202
*GradeDTI**	D1	A-(0-45%)	1	0.5473
*GradeDTI**	D2	A-(0-45%)	1	16.6717
*GradeDTI**	DN	A-(0-45%)	1	15.7341
*GradeDTI**	DS	A-(0-45%)	0	0

Rick_SAS · Posted 06-30-2017 09:14 AM

If the ACCS variable contains the count of the number of events and nonevents, then you should use

FREQ accs;

instead of using the WEIGHT statement. Frequencies and weights have different meanings in a regression.

sivakoya · Posted 06-30-2017 09:27 AM

@Rick_SAS Thanks for the link its very useful!

I am getting exact same results even after using FREQ accs;

proc genmod data=DTI_mod descending;

class grade DTI;

freq accs;

model bad = Grade DTI grade*DTI/ dist=binomial link=log;

run;

JacobSimonsen · Posted 06-30-2017 09:40 AM

I think you are done. The error message is because you have some cells with 0. Therefore, it can not estimate all parameters with the data you have, which then cause the warning in the log.

Its right that "freq" should be used instead of weight. The two options results in same parameter estimates, but not always same p-values.

StatDave · Posted 06-30-2017 01:32 PM

If your data consist of a count of events and a count of total trials, then the proper syntax is the following to fit the model which is a logistic model for this binomial response. You can use either LOGISTIC or GENMOD with the same syntax.

proc logistic data=DTI_mod;

class grade DTI;

model bad_accs/total_accs = Grade DTI grade*DTI;

run;

Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

Re: Modeling Counts data using PROC genmod + poisson distribution_results interpretation

SAS Innovate 2026 Registration is Open