BookmarkSubscribeRSS Feed
Calcite | Level 5

I ran a glm regression with log link and gamma distribution for modeling impact of appropriate cancer care on costs. Covariates include age, race/ethnicity, location, tumor stage, tumor grade to name a few.

With 'no appropriate care' and 'stage 1' as reference categories for appropriate care and tumor stage variables, respectively, I get beta estimates of 9.0663, 0.6953, 0.6669 for intercept, non-appropriate care' and 'stage 2 tumor', respectively.  

When I change the reference category for tumor stage to 'stage 2', I get beta estimates of 8.7319, 0.6953, and -0.3288 for intercept, non-appropriate care, and 'stage 1 tumor' respectively.  
Even though the beta estimates for the key independent variable and other covariates remain the same, the beta estimate for intercept changes everytime I change the reference values of certain variables.  Why does this happen?  Would this not change the finding for key indepedent variable everytime I change the reference group for any covariate?

I would appreciate if you can help me with this and also guide with an appropriate reference.

Thank you in advance.


Jade | Level 19

It is likely due to the non-full rank parameterization being used.  Changing the reference category sets that category estimate to zero, so that the intercept is the estimate for the reference category.  This only applies to categorical variables in the CLASS statement.



Calcite | Level 5

Thank you for your response, SteveDenham!

I am not a statistics person.  Is there a simple explanation for this?  All the variables in the model are categorical variables and I have used 'class' statement for all of these.



The predicted values are the same, but the estimate for the intercept depends on the reference category. So when you change the reference level, you will see the intercept estimate change in a predictable way.


It might not be "simple," but you can read more about different CLASS parameterizations in this blog post: "Coding and simulating categorical variables in regression models"

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 3 in conversation