Hello together, we are trying to fit a GLM with proc genmod. The dependent variable is health cost data and independent variables are group of treatment, age, sex, observation time, different comorbidities and different medications. For modelling the costs, we assumed a gamma distribution und a log link (Meanwhile we also tried other links and distributions). Now, we are interested to check the goodness of fit of the model. For this we examined the plot of estimated versus observed costs and the errors versus observed costs. But in our opinion both plots contradict a good model fit (see attached file). The estimated and observed costs vary randomly whereas the errors show a strong relationship to the observed costs. Our question is: Are these plots a correct, plausible way to check the model fit for a GLM? If yes, is there any way to improve the model fit? We already tried all different link and distribution functions and transformations of the cost data itself. The cost data are heavily skewed and include zero cost as well as very high costs. But those low and high costs are of interest as well. Our program: proc genmod data=input_data PLOTS=(PREDICTED RESCHI);
class group sex
comorbidity1 comorbidity2 ... /* all 1/0 - Variables */
medication1 medication2 ... /* all 1/0 - Variables */
;
model cost = group
age obeservation_time
sex CCI_Score
comobidity1 comorbidity2 ... /* all 1/0 - Variables */
medication1 medication2 ... /* all 1/0 - Variables */
/ dist = gamma link = log ;
output out = Residuals
pred = Pred
resraw = Resraw
reschi = Reschi ;
run;
title 'Proc genmod: Plot of estimated and residuals';
proc gplot data=residuals;
plot pred*cost Reschi*cost;
label cost='cost';
run; Thanks for an answer sasstats
... View more