Solved: Re: Negative Binomial vs poission

chuie · Posted 03-11-2019 08:03 PM

Hi There,

I ran two models : one with poission and one with negative binomial as the data shows over dispersion.

however the variables that are seen significant in poission is not significant in negative binomial. ( attached the output for reference)

Please advise the logic behind and what is the best way?

For FYI: My business case is to model # of close cases in regard to # of employee, # type of employee and average net productivity per hour.

Also in contrast estimate, is there a way to have estimates of each value without having to manually type it in estinames. I would like to create a continious graph like adding 1,2 3........20 employee etc.

ods pdf file= "C:\Users\swe00007\Desktop\report.pdf";
proc genmod data=TEST;
CLASS JOBCODE(PARAM = REF REF="CAS");
model CC= JOBCODE NOOF ANP JOBCODE*ANP /dist=poi link=log type3;
estimate "Effect of 10 employees" NOOF 10;
estimate "Effect of 1 employee"  NOOF 1;
ESTIMATE "Effect of 0.5 average touches per hour" ANP 0.5;
run;
proc genmod data=TEST;
CLASS JOBCODE(PARAM = REF REF="CAS");
model CC= JOBCODE NOOF ANP JOBCODE*ANP /dist=nb link=log type3;
estimate "Effect of 10 employees" NOOF 10;
estimate "Effect of 1 employee"  NOOF 1;
ESTIMATE "Effect of 0.5 average touches per hour" ANP 0.5;
run;
ods pdf close;

StatDave · Posted 03-15-2019 10:09 AM

Apparently JOBCODE is binary then. Since you have a significant interaction, the answer to your question clearly depends on job code. In order to do what you want the easiest way is to reparameterize the model. This form of the model provides separate intercepts and slopes on productivity for each job code (see this note which describes the general approach):

CLASS EMPLID JOBCODE(PARAM = REF REF="RN");
model CC= JOBCODE AVERAGEPRODUCTIVITY*JOBCODE / dist=nb noint;

You can then use the slopes to compute the value of interest for each job code. Keep in mind that your model is on the log mean.

View solution in original post

PGStats · Posted 03-11-2019 11:29 PM

The AICC clearly shows that the NB distribution fits better than the Poisson. The Value/DF ratio of 13.5 is a clear indication of overdispersion. Specifying PSCALE or DSCALE in the model statement for the Poisson fit will account for overdispersion and should make the inference results more similar between the two models.

Note that the estimates for multiples of continuous effects (e.g. 10 NOOF) are simply the exponentiated values (e.g. EXP(10*NOOF)).

PG

chuie · Posted 03-12-2019 01:49 PM

Thank you PG . This helped a lot.

Based on the model results, I am not sure the reliability of my model.

Even I have a individual level data I summed it up by pay period data in the current model. I believe this could be the reason. As I would like to try to fit the model as repeaded subject as each employee have data per pay period data which is counted 26 times in a year (biweekly pay) which is described as:

employeeID payperiod #closecases #averageproductivity Jobtype

1 1 24 1.2 RN

1 2 15 0.5 RN

1 3 14 0.9 RN

. . . . .

1 26 25 1.2 RN

2 1 56 1.9 CLINICAL

2 2 54 1.8 CLINICAL

. . . . .

and so on..

what I would like to know are following:

1. if close cases count is associated with jobtype. If so What is the magnitude ? 20% more....2 times etc?

2. is there an association of averageproductivity in close cases. if so how strong

3.what is the influence of adding 2 RN nurse in close cases?

4.what is the influence of adding 5 clinical in closing cases.

Please help how to answer these question in the given code or please advise the best model to answer those questions.

proc genmod data=TEST;
CLASS JOBtype(PARAM = REF REF="RN");
model CC= JOBTYPE AVERAGE PRODUCTIVITY JOBCODE*AVERAGEPRODUCTIVITY /dist=nb link=log scale =p;;
run;

StatDave · Posted 03-14-2019 10:42 AM

If your data consists of multiple observations per subject, then the count responses are correlated and you should fit an appropriate model. One approach is to fit a GEE model in PROC GEE with a REPEATED statement. In the SUBJECT= option, specify the subject identifying variable (presumably employeeID). Since the negative binomial distribution and GEE both are ways to deal with overdispersion, you shouldn't need a third way which is SCALE=P. The following fits a GEE version of the model allowing for a single correlation among the repeated measures within a subject. It might even be possible to reasonably use the Poisson rather than the negative binomial distribution.

proc gee data=TEST;
CLASS JOBtype(PARAM = REF REF="RN");
model CC= JOBTYPE | AVERAGEPRODUCTIVITY / dist=nb;
repeated subject=employeeID / type=exch;
run;

chuie · Posted 03-14-2019 02:04 PM

hi

I got this results as I am not sure how do i know if this is good model. I usually interpretate it with AIC, BIC.

Also please help me interpretate the estimates in easy unit as it is giving me in logit.

Thank you so much

Data SetWORK.GEEModel Information

Distribution	Negative Binomial
Link Function	Log
Dependent Variable	CC

Correlation StructureExchangeableGEE Model Information

Subject Effect	EMPLID (35 levels)
Number of Clusters	35
Clusters With Missing Values	20
Correlation Matrix Dimension	26
Maximum Cluster Size	26
Minimum Cluster Size	0

Correlation0.3306Exchangeable Working Correlation

QIC-39286.1120GEE Fit Criteria

QICu	-39320.8443

Intercept 1.31570.12921.06261.568810.19<.0001Parameter Estimates for Response Modelwith Empirical Standard Error EstimatesParameter Estimate StandardError 95% Confidence Limits Z Pr > |Z|

JOBCODE	CAS	0.7825	0.2623	0.2685	1.2965	2.98	0.0028
AVERAGEPRODUCTIVITY		1.6389	0.1727	1.3004	1.9774	9.49	<.0001
AVERAGEPRODU*JOBCODE	CAS	-1.4072	0.1766	-1.7534	-1.0610	-7.97	<.0001

StatDave · Posted 03-14-2019 02:31 PM

QIC in a GEE model serves a similar use as AIC or BIC does in non-repeated models. That is, you can use it to compare competing models with smaller QIC values indicating better models. As with AIC and BIC, there is no test. See the discussion about QIC in the Details: Generalized Estimating Equations section of the GEE documentation.

The interpretation of the model parameters is similar to a model without the REPEATED statement. Since this is a log-linked model, the parameters are the effects on the log mean of the response. Your results look like you didn't put JOBCODE in the CLASS statement which means you are assuming a linear effect of JOBCODE on the response. That might not make any sense.

chuie · Posted 03-14-2019 02:41 PM

Hi ,

I did put the job code in the model.(please see below)

with this model I need to showcase the business case that " the average productivity has to increase to X inorder to increase the # of closes (cc) by 20% something like that. So that they will set a benchmark in future that each employee has to work this way ( in terms of average productivity) to get the goal of 20% or something.

Any other way to analyse that statement? or do you recommend any book that optimize this kind of resources/analysis

I appreciate you and your time.

proc gee data=GEE;
CLASS EMPLID JOBCODE(PARAM = REF REF="RN");
model CC= JOBCODE | AVERAGEPRODUCTIVITY / dist=nb;
repeated subject=emplID / type=exch;
run;

StatDave · Posted 03-15-2019 10:09 AM

Apparently JOBCODE is binary then. Since you have a significant interaction, the answer to your question clearly depends on job code. In order to do what you want the easiest way is to reparameterize the model. This form of the model provides separate intercepts and slopes on productivity for each job code (see this note which describes the general approach):

CLASS EMPLID JOBCODE(PARAM = REF REF="RN");
model CC= JOBCODE AVERAGEPRODUCTIVITY*JOBCODE / dist=nb noint;

You can then use the slopes to compute the value of interest for each job code. Keep in mind that your model is on the log mean.