01-31-2017 07:23 AM
I am trying to use Proc Genmod to build Count Regression Model on Poisson. I have written the code as below:
proc genmod data = COUNT_data;
model count = KM /dist = poisson;
output out = outpt predicted = pred_val resdev = r_dev;
Here I have tried to output the predicted values, deviance residual in variables pred_val, r_dev respectively in the output dataset - outpt. Output of this proc executed is as below:
The SAS System
The GENMOD Procedure
Data Set WORK.COUNT_DATA
Link Function Log
Dependent Variable Count
Number of Observations Read 222
Number of Observations Used 222
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 220 170.1860 0.7736
Scaled Deviance 220 170.1860 0.7736
Pearson Chi-Square 220 199.7315 0.9079
Scaled Pearson X2 220 199.7315 0.9079
Log Likelihood -91.0389
Full Log Likelihood -371.0120
AIC (smaller is better) 746.0240
AICC (smaller is better) 746.0788
BIC (smaller is better) 752.8294
Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard Error Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 0.5680 0.1108 0.3508 0.7852 26.27 <.0001
KM 1 0.0000 0.0000 0.0000 0.0000 6.08 0.0137
Scale 0 1.0000 0.0000 1.0000 1.0000
I want to save the Deviance (170.1860 as in the above output) in some variable/dataset. How can it be done?
Also, How to calculate DFFITS? I want to find such observations where DFFITS > 2 * sqrt(2/n). I have seen that DFBETAS is available as an Output Statment option. On similar lines, is DFFITS available too?
01-31-2017 07:56 AM - edited 01-31-2017 07:57 AM
> I want to save the Deviance (170.1860 as in the above output) in some variable/dataset. How can it be done?
See the article "ODS OUTPUT: Store any statistic created by any SAS procedure"
> How to calculate DFFITS?
The DFFITS option is not available in PROC GENMOD because that statistic assumes an identity link function. However, you can use the COOKDS statistic, which is very similar and provides similar information about the influence of each observation on the fit.
01-31-2017 08:18 AM
01-31-2017 09:09 AM
I do not know the answer to your question.
For OLS, DFFITS are closely related to the Studentized residual. When the errors are normally distributed, you can use that relationship to derive the distribution of the DFFITS statistic. Because you know the sampling distribution, you can use criteria such as DFFITS > 2 * sqrt(2/n) or 2 * sqrt(p/n) to find "extreme" values of the statistic.
Generalized linear models do not have Studentized residuals, they have other kinds of residuals (such as Pearson, deviance, or chi-square). You can compute the change in the deviance or chi-square that is attributed to deleting each observation, and this becomes a measure of influence.
I was unable to find a textbook or journal article that explains how to generalize DFFITS to generalized linear regression models. Consequently, I recommend using one of the case-deletion statistics that PROC GENMOD provides, such as Cook's D. Perhaps an expert such as @SteveDenham or @lvm can provide additional insight.
02-01-2017 08:36 AM
I'll back up @Rick_SAS on this one. DFFITS is not appropriate for generalized linear models, as the studentized residual depends on the assumption of normality of errors. Cook's D has been used fairly regularly to check for influential observations.