BookmarkSubscribeRSS Feed
Ruth
Fluorite | Level 6

Hi Sir,

I finally finished the running for a linear regresion model using PROC GENMOD. But it took 25 hours. The dataset has 1 million cases and 40 categorical variables.

What I don't quite understand is that:

The estimated intercept is 1400, as the overall mean (all the predicators in the model are categorical and parameterized with effect coding). But the original observed mean for the dependent variable is only 250. I don't understand why there is such a big difference. Because of poor model fit?

Thanks for your idea.

2 REPLIES 2
Rick_SAS
SAS Super FREQ

What is important is the number of levels (=unique values) in your classification variables. If each classification variable has 10 levels, then the regression involves approximately 400 dummy variables as regressors.

If I recall, you are using GENMOD only because you want to use a parametrerization that is different than the GLM encoding. How long does it take for your problem to run in GLM? GENMOD solves a maximum likelihood problem, which involves an iterative optimization, so it will be slower than GLM on the same problem.

For effect coding, the main effects estimate the difference in the effect of each nonreference level compared to the average effect over all four levels. That average effect gets lumped in with the intercept. That's why your Intercept estimate is different than the observed mean.

I assume you know that the predicted values you get from GENMOD are the same as you get from GLM. The only difference is how to INTERPRET the parameters. For an example with continuous variables, see http://blogs.sas.com/content/iml/2010/11/10/regression-coefficients-for-different-polynomial-bases/

Ruth
Fluorite | Level 6

It only took 5 mins to run the model in PROC GLM. What a difference.Smiley Wink

Thanks again.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1520 views
  • 0 likes
  • 2 in conversation