topic Re: proc genmod: check of model fit in Statistical Procedures

proc genmod: check of model fit

sasstats — Mon, 12 Nov 2018 13:34:57 GMT

Hello together,

we are trying to fit a GLM with proc genmod.

The dependent variable is health cost data and independent variables are group of treatment, age, sex, observation time, different comorbidities and different medications.

For modelling the costs, we assumed a gamma distribution und a log link (Meanwhile we also tried other links and distributions).

Now, we are interested to check the goodness of fit of the model.

For this we examined the plot of estimated versus observed costs and the errors versus observed costs.

But in our opinion both plots contradict a good model fit (see attached file). The estimated and observed costs vary randomly whereas the errors show a strong relationship to the observed costs.

Our question is: Are these plots a correct, plausible way to check the model fit for a GLM?

If yes, is there any way to improve the model fit?

We already tried all different link and distribution functions and transformations of the cost data itself.

The cost data are heavily skewed and include zero cost as well as very high costs. But those low and high costs are of interest as well.

Our program:

proc genmod data=input_data PLOTS=(PREDICTED RESCHI);
     class group sex 
	   comorbidity1 comorbidity2 ... /* all 1/0 - Variables */
	   medication1 medication2 ...  /* all 1/0 - Variables */
		;
  model cost =  group 
		age obeservation_time 
		sex CCI_Score
		comobidity1 comorbidity2 ... /* all 1/0 - Variables */
		medication1 medication2 ...  /* all 1/0 - Variables */
	/ dist = gamma link = log ; 

  output out       = Residuals
         pred      = Pred
         resraw    = Resraw
         reschi    = Reschi
         ;
run;

title 'Proc genmod: Plot of estimated and residuals';
proc gplot data=residuals;
plot pred*cost Reschi*cost;
label cost='cost';
run;

Thanks for an answer

sasstats

Re: proc genmod: check of model fit

Rick_SAS — Tue, 13 Nov 2018 13:39:52 GMT

I agree that these two plots do not indicate a good fit. However, when you have multiple variables, you need to be a little careful when you create plots like this. You are projecting the predicted responses onto one dimension (cost), whereas a better approach is to slice the predicted response surface. You can use the EFFECTPLOT statement (with the FIT or SLICEFIT options) to create a more effective visualization of the response surface. Personally, I don't think it will matter in terms of assessing fit, but the EFFECTPLOT statement is a powerful diagnostic tool that is worth learning about. It should be helpful as you refine your model.

> is there any way to improve the model fit?

We don't really have enough information to answer that question. Two possible approaches are:

1. You can adopt a model-building approach in which you incrementally build up the model based on domain-specific knowledge and looking at the fit statistics. You might be missing interaction terms or nonlinear terms in the model.

2. You can adopt a "shotgun" approach and use PROC GLMSELECT or PROC HPGENSELECT to select the model effects that best fit the data. If you choose to use variable selection, you should consider using crossvalidation to avoid overfitting the data. If you aren't familiar with the model selection procedures, here are two references:

"Statistical Model Building for Large, Complex Data: Five New Directions in SAS/STAT® Software"

"Introducing the HPGENSELECT Procedure: Model Selection for Generalized Linear Models and More"

Re: proc genmod: check of model fit

sasstats — Tue, 13 Nov 2018 13:23:29 GMT

Hello Rick_SAS

thanks for your answer and the helpful hints.

It might be that there are interactions between our independent variables. We have to check this.

One request we do have:

Could you please give the correct link for your first reference, if possible?

Now, It leads to the same paper as you second reference.

Thank you very much

sasstats

Re: proc genmod: check of model fit

Rick_SAS — Tue, 13 Nov 2018 13:40:18 GMT

DONE