Hi All,
I am trying to use Proc Genmod to build Count Regression Model on Poisson. I have written the code as below:
proc genmod data = COUNT_data;
model count = KM /dist = poisson;
output out = outpt predicted = pred_val resdev = r_dev;
run;
Here I have tried to output the predicted values, deviance residual in variables pred_val, r_dev respectively in the output dataset - outpt. Output of this proc executed is as below:
The SAS System
The GENMOD Procedure
Model Information
Data Set WORK.COUNT_DATA
Distribution Poisson
Link Function Log
Dependent Variable Count
Number of Observations Read 222
Number of Observations Used 222
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 220 170.1860 0.7736
Scaled Deviance 220 170.1860 0.7736
Pearson Chi-Square 220 199.7315 0.9079
Scaled Pearson X2 220 199.7315 0.9079
Log Likelihood -91.0389
Full Log Likelihood -371.0120
AIC (smaller is better) 746.0240
AICC (smaller is better) 746.0788
BIC (smaller is better) 752.8294
Algorithm converged.
Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard Error Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 0.5680 0.1108 0.3508 0.7852 26.27 <.0001
KM 1 0.0000 0.0000 0.0000 0.0000 6.08 0.0137
Scale 0 1.0000 0.0000 1.0000 1.0000
I want to save the Deviance (170.1860 as in the above output) in some variable/dataset. How can it be done?
Also, How to calculate DFFITS? I want to find such observations where DFFITS > 2 * sqrt(2/n). I have seen that DFBETAS is available as an Output Statment option. On similar lines, is DFFITS available too?
> I want to save the Deviance (170.1860 as in the above output) in some variable/dataset. How can it be done?
See the article "ODS OUTPUT: Store any statistic created by any SAS procedure"
> How to calculate DFFITS?
The DFFITS option is not available in PROC GENMOD because that statistic assumes an identity link function. However, you can use the COOKDS statistic, which is very similar and provides similar information about the influence of each observation on the fit.
I do not know the answer to your question.
For OLS, DFFITS are closely related to the Studentized residual. When the errors are normally distributed, you can use that relationship to derive the distribution of the DFFITS statistic. Because you know the sampling distribution, you can use criteria such as DFFITS > 2 * sqrt(2/n) or 2 * sqrt(p/n) to find "extreme" values of the statistic.
Generalized linear models do not have Studentized residuals, they have other kinds of residuals (such as Pearson, deviance, or chi-square). You can compute the change in the deviance or chi-square that is attributed to deleting each observation, and this becomes a measure of influence.
I was unable to find a textbook or journal article that explains how to generalize DFFITS to generalized linear regression models. Consequently, I recommend using one of the case-deletion statistics that PROC GENMOD provides, such as Cook's D. Perhaps an expert such as @SteveDenham or @lvm can provide additional insight.
I'll back up @Rick_SAS on this one. DFFITS is not appropriate for generalized linear models, as the studentized residual depends on the assumption of normality of errors. Cook's D has been used fairly regularly to check for influential observations.
Steve Denham
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.