BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sivakoya
Obsidian | Level 7

I am trying to model counts data to rank order risk of accounts going bad by grade and band. data looks like this:

Gradebandyearbad rateTotal AccsBad Accs
AA-(0-45%)20160.31%392412
AA-(0-45%)20130.20%51556103
AA-(0-45%)20140.24%49918120
AA-(0-45%)20150.25%59723150
AB-(>45-55%)20160.80%2492
AB-(>45-55%)20150.22%36648
AB-(>45-55%)20130.32%314910

 

when I summarize the above data, I see below observed bad rate by grade*band:

 

Gradebandbad rateTotal AccsBad Accs
AA-(0-45%)0.23%165121385
BA-(0-45%)0.68%2501561708
CA-(0-45%)1.92%2404784609
DA-(0-45%)3.05%338091030
EA-(0-45%)3.89%2853111
FA-(0-45%)1.52%7417113
GA-(0-45%)3.30%3026100

 

I have used proc genmod with poisson distribution to model the above data to compare if it rank ordering according to the observed results.

 

proc genmod data=data;

class grade band;

model bad_Accs = Grade bands grade*DTI/ dist=poisson link=log;

run;

 

I see the below results:

 

Analysis of Maximum likelihood paramter estimates
parameter  Estimate
Grade*bandGA-(0-45%)0
Grade*bandDA-(0-45%)0.5404
Grade*bandAA-(0-45%)0.9426
Grade*bandBA-(0-45%)1.0461
Grade*bandCA-(0-45%)1.2279
Grade*bandEA-(0-45%)17.6989
Grade*bandFA-(0-45%)17.7168

 

from the results it suggests that a F grade with A-(0-45%) is 17.71 % more likely to go bad compared to other grades?

 

but from the observed results, I see bad rate is high for grade E, should'nt grade E have higher parameter estimate in genmod ?

 

or am I modeling wrong vraiable? I feel like I should model for Total Accs/Bad Accs instead of just Bad Accs to consider severity.

 

when I try to do that as below, its givng me an error:

 

proc genmod data=DTI;

class grade DTI;

model (Accs/bad_Accs)*100 = Grade DTI grade*DTI/ dist=poisson link=log;

run;

 

19 model (Accs/bad_Accs)*100 = Grade DTI grade*DTI/ dist=poisson link=log;

                        _

                       22

                       76

ERROR 22-322: Syntax error, expecting one of the following: a name, ','.

ERROR 76-322: Syntax error, statement will be ignored.

 

Any suggestion on how to model for bad accs including severity as well in the model?

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

If your data consist of a count of events and a count of total trials, then the proper syntax is the following to fit the model which is a logistic model for this binomial response.  You can use either LOGISTIC or GENMOD with the same syntax.

 

proc logistic data=DTI_mod;

class grade DTI;

model bad_accs/total_accs = Grade DTI grade*DTI;

run;

 

 

 

View solution in original post

10 REPLIES 10
Ksharp
Super User

Use offset= option.

 

proc genmod data=DTI;

class grade DTI;

model bad_Accs= Grade DTI grade*DTI/ dist=poisson link=log offset=total_accs;

run;

JacobSimonsen
Barite | Level 11

The offset in @Ksharp's solution should be "log(total_accs)", but otherwise I agree.  It is actually also possible to model the rate directly as you suggest, just, you should weight with the "total_accs"

proc genmod data=DTI;
class grade DTI;
model rate= Grade DTI grade*DTI/ dist=poisson link=log;
weight total_accs;
run;

where the rate is a variable defined as bad_Accs/total_accs. The "offset"-solution and the "weight-solution" are equivalent (same estimate and standard errors).

But maybe a better solution here is to regard the bad_accs as outcome from a binomial distribution.

sivakoya
Obsidian | Level 7

@Ksharp@JacobSimonsen thanks!

when i try to use @Ksharp solution I am getting below errors.

proc genmod data=DTI;

class grade DTI;

/*weight accs;*/

model bad_Accs = Grade DTI grade*DTI/ dist=poisson link=log offset=log(accs);

run;

 

ERROR: Variable LOG not found.

(or)

 

proc genmod data=DTI;

class grade DTI;

/*weight accs;*/

model bad_Accs = Grade DTI grade*DTI/ dist=poisson link=log offset=accs;

run;

 

ERROR: The mean parameter is either invalid or at a limit of its range for some observations.

 

I have also tried @JacobSimonsen your approach but still not able to relate the results with observed results.

 

proc genmod data=DTI_new;

class grade DTI;

/* rate =  bad_accs/total_accs  */

model rate = Grade DTI grade*DTI/ dist=poisson link=log;

 

weight total_accs;

run;

 

observed:

Grade

DTIbad rateTotal AccsBad Accs
AA-(0-45%)0.23%165121385
BA-(0-45%)0.68%2501561708
CA-(0-45%)1.92%2404784609
DA-(0-45%)3.05%338091030
EA-(0-45%)3.89%2853111
FA-(0-45%)1.52%7417113
GA-(0-45%)3.30%3026100

 

model results:

Parameter  DFEstimate
Grade*DTIAA-(0-45%)10.1193
Grade*DTIBA-(0-45%)10.7107
Grade*DTICA-(0-45%)11.202
Grade*DTIDA-(0-45%)10.5473
Grade*DTIEA-(0-45%)116.0838
Grade*DTIFA-(0-45%)115.1462
Grade*DTIGA-(0-45%)00

 

Maybe I am not reading it right ( I am trying to relate model estimate to the observed bad rate %), but it doesn't seem to rank order the bad rate by grade*DTI correctly.

 

JacobSimonsen
Barite | Level 11

you can not put "log(accs)" into offset. You have to create a variable in a dataset before the procedure that contain the log values. That variable should be in offset.

 

The message, "ERROR: The mean parameter is either invalid or at a limit of its range for some observations" can be because there is a level in the interaction term where observation is zero. I dont think its a coding error.

 

The

JacobSimonsen
Barite | Level 11

Again, as I see your data, it looks more as binomial data than Poisson distributed data. Why do you want to use Poisson distribution instead of binomial distribution?

sivakoya
Obsidian | Level 7

Thanks. I have modified my data so I have a 2 level target variable and tried genmod with binomial distribution. It is giving me similar results compared to poisson.

 

data:

 

GradeDTIyearbadAccs
A1A-(0-45%)2013N51453
A1A-(0-45%)2013Y103
A1A-(0-45%)2014N49798
A1A-(0-45%)2014Y120
A1A-(0-45%)2015N59573
A1A-(0-45%)2015Y150
A1A-(0-45%)2016N3912

 

code:

proc genmod data=DTI_mod descending;

class grade DTI;

/*weight accs;*/

model bad = Grade DTI grade*DTI/ dist=binomial link=log;

weight accs;

run;

 

log:

 

NOTE: PROC GENMOD is modeling the probability that bad='Y'.

WARNING: The negative of the Hessian is not positive definite. The convergence is questionable.

WARNING: The procedure is continuing but the validity of the model fit is questionable.

WARNING: The specified model did not converge.

NOTE: The Pearson chi-square and deviance are not computed since the AGGREGATE option is not specified.

WARNING: Negative of Hessian not positive definite.

NOTE: The scale parameter was held fixed.

NOTE: PROCEDURE GENMOD used (Total process time):

real time 0.15 seconds

cpu time 0.07 seconds

 

results:

Parameter  DFEstimate
Grade*DTIA1A-(0-45%)10.1193
Grade*DTIA2A-(0-45%)10.7107
Grade*DTIA3A-(0-45%)11.202
Grade*DTID1A-(0-45%)10.5473
Grade*DTID2A-(0-45%)116.6717
Grade*DTIDNA-(0-45%)115.7341
Grade*DTIDSA-(0-45%)00

 

 

Rick_SAS
SAS Super FREQ

If the ACCS variable contains the count of the number of events and nonevents, then you should use  

FREQ accs;

instead of using the WEIGHT statement. Frequencies and weights have different meanings in a regression. 

sivakoya
Obsidian | Level 7

@Rick_SAS Thanks for the link its very useful!

 

I am getting exact same results even after using FREQ accs;

 

proc genmod data=DTI_mod descending;

class grade DTI;

freq accs;

model bad = Grade DTI grade*DTI/ dist=binomial link=log;

run;

JacobSimonsen
Barite | Level 11

I think you are done. The error message is because you have some cells with 0. Therefore, it can not estimate all parameters with the data you have, which then cause the warning in the log.

Its right that "freq" should be used instead of weight. The two options results in same parameter estimates, but not always same p-values.

StatDave
SAS Super FREQ

If your data consist of a count of events and a count of total trials, then the proper syntax is the following to fit the model which is a logistic model for this binomial response.  You can use either LOGISTIC or GENMOD with the same syntax.

 

proc logistic data=DTI_mod;

class grade DTI;

model bad_accs/total_accs = Grade DTI grade*DTI;

run;

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 4220 views
  • 11 likes
  • 5 in conversation