BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chuie
Quartz | Level 8

Hi There,

I ran two models : one with poission and one with negative binomial as the data shows over dispersion.

however the variables that are seen significant in poission is not significant in negative binomial. ( attached the output for reference)

 Please advise the  logic behind and what is the best way?

For FYI: My business case  is to model # of close cases in regard to # of employee, # type of employee and average net productivity per hour.

 

Also in contrast estimate, is there a way to have estimates of each value without having to manually type it in estinames. I would like to create a  continious graph like adding 1,2 3........20 employee etc.

 

ods pdf file= "C:\Users\swe00007\Desktop\report.pdf";
proc genmod data=TEST;
CLASS JOBCODE(PARAM = REF REF="CAS");
model CC= JOBCODE NOOF ANP JOBCODE*ANP /dist=poi link=log type3;
estimate "Effect of 10 employees" NOOF 10;
estimate "Effect of 1 employee"  NOOF 1;
ESTIMATE "Effect of 0.5 average touches per hour" ANP 0.5;
run;
proc genmod data=TEST;
CLASS JOBCODE(PARAM = REF REF="CAS");
model CC= JOBCODE NOOF ANP JOBCODE*ANP /dist=nb link=log type3;
estimate "Effect of 10 employees" NOOF 10;
estimate "Effect of 1 employee"  NOOF 1;
ESTIMATE "Effect of 0.5 average touches per hour" ANP 0.5;
run;
ods pdf close;

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Apparently JOBCODE is binary then. Since you have a significant interaction, the answer to your question clearly depends on job code. In order to do what you want the easiest way is to reparameterize the model. This form of the model provides separate intercepts and slopes on productivity for each job code (see this note which describes the general approach):

 

CLASS EMPLID JOBCODE(PARAM = REF REF="RN");
model CC= JOBCODE  AVERAGEPRODUCTIVITY*JOBCODE / dist=nb noint;

 

You can then use the slopes to compute the value of interest for each job code. Keep in mind that your model is on the log mean.

View solution in original post

7 REPLIES 7
PGStats
Opal | Level 21

The AICC clearly shows that the NB distribution fits better than the Poisson. The Value/DF ratio of 13.5 is a clear indication of overdispersion. Specifying PSCALE or DSCALE in the model statement for the Poisson fit will account for overdispersion and should make the inference results more similar between the two models.

 

Note that the estimates for multiples of continuous effects (e.g. 10 NOOF) are simply the exponentiated values (e.g. EXP(10*NOOF)).

PG
chuie
Quartz | Level 8

Thank you PG . This helped a lot.

 Based on the model results,  I am not sure the reliability of my model. 

Even I have a  individual level data I summed it up by pay period data  in the current model. I believe this could be the reason. As I would like to try to fit the model as repeaded subject as each employee have data per pay period data which is  counted 26 times in a year (biweekly pay)  which is described as: 

employeeID      payperiod     #closecases          #averageproductivity    Jobtype

1                         1                         24                           1.2                         RN

1                          2                        15                          0.5                          RN

1                         3                         14                          0.9                          RN

.                           .                           .                             .                                .

1                         26                        25                          1.2                          RN

2                        1                           56                          1.9                       CLINICAL

2                        2                            54                         1.8                       CLINICAL

.                          .                              .                          .                                 .

 

and so on..

what I  would like to know are following:

1. if close cases count is associated with jobtype. If so What is the magnitude ? 20% more....2 times etc?

2. is there an association of averageproductivity in close cases. if so  how strong

 3.what is the influence of adding 2 RN nurse in close cases?

4.what is the influence of adding 5 clinical in closing cases.

 

 

Please help how to answer these question in the given code or please advise the best model to answer those questions.

proc genmod data=TEST;
CLASS JOBtype(PARAM = REF REF="RN");
model CC= JOBTYPE  AVERAGE PRODUCTIVITY JOBCODE*AVERAGEPRODUCTIVITY  /dist=nb link=log scale =p;;
run;

 

 

 

StatDave
SAS Super FREQ

If your data consists of multiple observations per subject, then the count responses are correlated and you should fit an appropriate model. One approach is to fit a GEE model in PROC GEE with a REPEATED statement. In the SUBJECT= option, specify the subject identifying variable (presumably employeeID). Since the negative binomial distribution and GEE both are ways to deal with overdispersion, you shouldn't need a third way which is SCALE=P. The following fits a GEE version of the model allowing for a single correlation among the repeated measures within a subject. It might even be possible to reasonably use the Poisson rather than the negative binomial distribution. 

 

proc gee data=TEST;
CLASS JOBtype(PARAM = REF REF="RN");
model CC= JOBTYPE | AVERAGEPRODUCTIVITY / dist=nb;
repeated subject=employeeID / type=exch;
run;
chuie
Quartz | Level 8

hi

I got this results as I am not sure  how do i know if this is good model. I usually interpretate it with AIC, BIC.

 

Also  please help me interpretate the estimates in easy unit as it is giving me in logit.

Thank you so much

 

Data SetWORK.GEEModel Information
DistributionNegative Binomial
Link FunctionLog
Dependent VariableCC
Correlation StructureExchangeableGEE Model Information
Subject EffectEMPLID (35 levels)
Number of Clusters35
Clusters With Missing Values20
Correlation Matrix Dimension26
Maximum Cluster Size26
Minimum Cluster Size0
Correlation0.3306Exchangeable Working Correlation
QIC-39286.1120GEE Fit Criteria
QICu-39320.8443
Intercept 1.31570.12921.06261.568810.19<.0001Parameter Estimates for Response Modelwith Empirical Standard Error EstimatesParameter   Estimate StandardError 95% Confidence Limits Z Pr > |Z|
JOBCODECAS0.78250.26230.26851.29652.980.0028
AVERAGEPRODUCTIVITY 1.63890.17271.30041.97749.49<.0001
AVERAGEPRODU*JOBCODECAS-1.40720.1766-1.7534-1.0610-7.97<.0001

 

StatDave
SAS Super FREQ

QIC in a GEE model serves a similar use as AIC or BIC does in non-repeated models. That is, you can use it to compare competing models with smaller QIC values indicating better models. As with AIC and BIC, there is no test. See the discussion about QIC in the Details: Generalized Estimating Equations section of the GEE documentation.

 

The interpretation of the model parameters is similar to a model without the REPEATED statement. Since this is a log-linked model, the parameters are the effects on the log mean of the response. Your results look like you didn't put JOBCODE in the CLASS statement which means you are assuming a linear effect of JOBCODE on the response. That might not make any sense. 

chuie
Quartz | Level 8

Hi ,

I did put the job code in the  model.(please see below)

 with this model I need to showcase the business case that " the average productivity has to increase  to X  inorder to increase the # of closes (cc) by 20% something like that. So that they will set a benchmark in future that each employee has to work this way ( in terms of average productivity) to get the goal of 20% or something.

 

Any other  way to  analyse that statement? or do you recommend any book that optimize this kind of resources/analysis

I appreciate you and  your time.

proc gee data=GEE;
CLASS EMPLID JOBCODE(PARAM = REF REF="RN");
model CC= JOBCODE | AVERAGEPRODUCTIVITY / dist=nb;
repeated subject=emplID / type=exch;
run;

StatDave
SAS Super FREQ

Apparently JOBCODE is binary then. Since you have a significant interaction, the answer to your question clearly depends on job code. In order to do what you want the easiest way is to reparameterize the model. This form of the model provides separate intercepts and slopes on productivity for each job code (see this note which describes the general approach):

 

CLASS EMPLID JOBCODE(PARAM = REF REF="RN");
model CC= JOBCODE  AVERAGEPRODUCTIVITY*JOBCODE / dist=nb noint;

 

You can then use the slopes to compute the value of interest for each job code. Keep in mind that your model is on the log mean.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1386 views
  • 5 likes
  • 3 in conversation