turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Overall mean of general linear model

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-16-2011 10:17 AM

This is a very tricky but interesting problem. I hope some statistical expert can really help. Below is the description of the data and analysis.

1) Data

1 million cases, 40 categorical variables with levels ranging from 2 to 50. The dependent variable is continuous which is pretty normally distributed.

2) Analysis: linear regression with effect coding for all categtorical variables.

Step 1): The intercept (i.e., overall mean) was estimated to be 570, with all the final significant variables in the model.

Step 2): remove 1307 exceptional cases (with leverage>2p/n and standardized residuals > 2), the intercept became 270.

For this analysis, the intercept changed so much from 570 to 270, with the fact that only 1307 cases deleted (compared to the sheer large sample size of 1 million). Because the intercept standards for the overall mean, this caused potential problem for the interpretation of the model: how can the overall mean change so significantly with only 1307 cases deleted?

I don't know how the intercept (overall mean) is estimated in general linear model. Any reference book?

Please help. Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2011 04:54 AM

Those deleted obs maybe valuable obs.

DId you check the COOK distance to see the contrubution of these obs to your model?

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2011 05:26 AM

Thanks, Ksharp. COOK distance helped the model.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-17-2011 10:15 AM

Quite possibly a mixture problem. It looks like 0.13% of the data significantly elevate the intercept. That can happen. Consider the mean net worth of a very rural county, where 200 people live, say $50,000 per person. All of a sudden Bill Gates moves in, worth say $20B. The mean net worth is now 1991 times as large, with only a change of 0.5% of the data.

I would wager that those 1307 exceptional cases, when examined separately, tell you something quite interesting.

Steve Denham