Re: How to get Residual Deviance and DFFITS using proc genmod

amanegm · Posted 01-31-2017 07:23 AM

Hi All,

I am trying to use Proc Genmod to build Count Regression Model on Poisson. I have written the code as below:

proc genmod data = COUNT_data;
model count = KM /dist = poisson;
output out = outpt predicted = pred_val resdev = r_dev;
run;

Here I have tried to output the predicted values, deviance residual in variables pred_val, r_dev respectively in the output dataset - outpt. Output of this proc executed is as below:

The SAS System

The GENMOD Procedure

Model Information

Data Set WORK.COUNT_DATA

Distribution Poisson

Link Function Log

Dependent Variable Count

Number of Observations Read 222

Number of Observations Used 222

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 220 170.1860 0.7736

Scaled Deviance 220 170.1860 0.7736

Pearson Chi-Square 220 199.7315 0.9079

Scaled Pearson X2 220 199.7315 0.9079

Log Likelihood -91.0389

Full Log Likelihood -371.0120

AIC (smaller is better) 746.0240

AICC (smaller is better) 746.0788

BIC (smaller is better) 752.8294

Algorithm converged.

Analysis Of Maximum Likelihood Parameter Estimates

Parameter DF Estimate Standard Error Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq

Intercept 1 0.5680 0.1108 0.3508 0.7852 26.27 <.0001

KM 1 0.0000 0.0000 0.0000 0.0000 6.08 0.0137

Scale 0 1.0000 0.0000 1.0000 1.0000

I want to save the Deviance (170.1860 as in the above output) in some variable/dataset. How can it be done?

Also, How to calculate DFFITS? I want to find such observations where DFFITS > 2 * sqrt(2/n). I have seen that DFBETAS is available as an Output Statment option. On similar lines, is DFFITS available too?

Rick_SAS · Posted 01-31-2017 07:56 AM

> I want to save the Deviance (170.1860 as in the above output) in some variable/dataset. How can it be done?

See the article "ODS OUTPUT: Store any statistic created by any SAS procedure"

> How to calculate DFFITS?

The DFFITS option is not available in PROC GENMOD because that statistic assumes an identity link function. However, you can use the COOKDS statistic, which is very similar and provides similar information about the influence of each observation on the fit.

amanegm · Posted 01-31-2017 08:18 AM

Thank you Rick. That was very helpful.
DFFITS is not available in Proc Genmod. But is it correct to use DFFITS in Proc Genmod, if I calculate it by some other means. I am using Log Link function.

Also, in R, if I build similar model as below:
l = glm(resp[[1]] ~ unlist(regr[[1]]) , family="poisson")

DFFITS function is available to compute that statistic.

So I am confused if it the right way?

Rick_SAS · Posted 01-31-2017 09:09 AM

I do not know the answer to your question.

For OLS, DFFITS are closely related to the Studentized residual. When the errors are normally distributed, you can use that relationship to derive the distribution of the DFFITS statistic. Because you know the sampling distribution, you can use criteria such as DFFITS > 2 * sqrt(2/n) or 2 * sqrt(p/n) to find "extreme" values of the statistic.

Generalized linear models do not have Studentized residuals, they have other kinds of residuals (such as Pearson, deviance, or chi-square). You can compute the change in the deviance or chi-square that is attributed to deleting each observation, and this becomes a measure of influence.

I was unable to find a textbook or journal article that explains how to generalize DFFITS to generalized linear regression models. Consequently, I recommend using one of the case-deletion statistics that PROC GENMOD provides, such as Cook's D. Perhaps an expert such as @SteveDenham or @lvm can provide additional insight.

SteveDenham · Posted 02-01-2017 08:36 AM

I'll back up @Rick_SAS on this one. DFFITS is not appropriate for generalized linear models, as the studentized residual depends on the assumption of normality of errors. Cook's D has been used fairly regularly to check for influential observations.

Steve Denham