BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SColby
Calcite | Level 5

I would like to estimate a fixed effects model with a lot of different classes (57,601).  The problem is when I run PROC GLM, I get an error saying, "Number of levels for some effects > 32,767."  In other words, PROC GLM can only make 32,767 dummy variables and since 57,601 > 32,767, it fails.  How am I to work around this problem?  Seems like manually making dummy variables and using something like PROC REG wouldn't work because the created data set would be too large.

Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

It is true that GLM will not allow over 32767 levels. I just tried it on my 64 bit version of 9.3 (simulated 60000 levels).  You also get an error when trying to use MIXED. However, HPMIXED runs just fine (same syntax as MIXED, and similar to GLM), but this takes a while. In fact, HPMIXED was designed for problems like this. Check the documentation.

View solution in original post

5 REPLIES 5
SteveDenham
Jade | Level 19

The first thing I think of is "What in the world has 57,601 classes?"  Then I would think of ways to reduce this number--which can be consolidated?  Which are possibly continuous, so that I could remove them from the CLASS statement. Another possibility might, and I stress MIGHT, be to look into PROC HPMIXED. Use of sparse matrix techniques may help here.

Steve Denham


Rick_SAS
SAS Super FREQ

Well, there are about 43,000 zip codes in the US...  However, statistical analyses based on ZIP codes has problems, which is one reason that the US government defined Metropolitan Statistical Areas (MSAs) and Core-Based Statistical Areas (CBSAs):

List of United States core based statistical areas - Wikipedia, the free encyclopedia

Are you including a large number of interaction terms? I've never seen this error, but I wonder if it could occur by not truncating the interactions. For example:

proc glm data=GLMData;

class c1-c10;

model y = c1|c2|c3|c4|c5|c6|c7|c8|c9|c10;

run;

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

It is true that GLM will not allow over 32767 levels. I just tried it on my 64 bit version of 9.3 (simulated 60000 levels).  You also get an error when trying to use MIXED. However, HPMIXED runs just fine (same syntax as MIXED, and similar to GLM), but this takes a while. In fact, HPMIXED was designed for problems like this. Check the documentation.

SteveDenham
Jade | Level 19

Good to know that my hunch pays off when somebody who knows actually checks it out.

Steve Denham

SColby
Calcite | Level 5

HPMAX worked!

Now I am going to push the limit further by trying to perform a 4 equation 3 stage least squares estimate (3SLS) on these data.

Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2741 views
  • 3 likes
  • 4 in conversation