turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- GLM, Fixed Effects and the error "Number of levels...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-24-2012 07:47 PM

I would like to estimate a fixed effects model with a lot of different classes (57,601). The problem is when I run PROC GLM, I get an error saying, "Number of levels for some effects > 32,767." In other words, PROC GLM can only make 32,767 dummy variables and since 57,601 > 32,767, it fails. How am I to work around this problem? Seems like manually making dummy variables and using something like PROC REG wouldn't work because the created data set would be too large.

Thank you in advance.

Accepted Solutions

Solution

09-25-2012
11:17 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

09-25-2012 11:17 AM

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SColby

09-25-2012 07:43 AM

The first thing I think of is "What in the world has 57,601 classes?" Then I would think of ways to reduce this number--which can be consolidated? Which are possibly continuous, so that I could remove them from the CLASS statement. Another possibility might, and I stress MIGHT, be to look into PROC HPMIXED. Use of sparse matrix techniques may help here.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

09-25-2012 08:29 AM

Well, there are about 43,000 zip codes in the US... However, statistical analyses based on ZIP codes has problems, which is one reason that the US government defined Metropolitan Statistical Areas (MSAs) and Core-Based Statistical Areas (CBSAs):

List of United States core based statistical areas - Wikipedia, the free encyclopedia

Are you including a large number of interaction terms? I've never seen this error, but I wonder if it could occur by not truncating the interactions. For example:

proc glm data=GLMData;

class c1-c10;

model y = c1|c2|c3|c4|c5|c6|c7|c8|c9|c10;

run;

Solution

09-25-2012
11:17 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Rick_SAS

09-25-2012 11:17 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

09-25-2012 11:25 AM

Good to know that my hunch pays off when somebody who knows actually checks it out.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

09-25-2012 03:38 PM

**HPMAX worked**!

Now I am going to push the limit further by trying to perform a 4 equation 3 stage least squares estimate (3SLS) on these data.

*Thanks!*