I would like to estimate a fixed effects model with a lot of different classes (57,601). The problem is when I run PROC GLM, I get an error saying, "Number of levels for some effects > 32,767." In other words, PROC GLM can only make 32,767 dummy variables and since 57,601 > 32,767, it fails. How am I to work around this problem? Seems like manually making dummy variables and using something like PROC REG wouldn't work because the created data set would be too large.
Thank you in advance.
It is true that GLM will not allow over 32767 levels. I just tried it on my 64 bit version of 9.3 (simulated 60000 levels). You also get an error when trying to use MIXED. However, HPMIXED runs just fine (same syntax as MIXED, and similar to GLM), but this takes a while. In fact, HPMIXED was designed for problems like this. Check the documentation.
The first thing I think of is "What in the world has 57,601 classes?" Then I would think of ways to reduce this number--which can be consolidated? Which are possibly continuous, so that I could remove them from the CLASS statement. Another possibility might, and I stress MIGHT, be to look into PROC HPMIXED. Use of sparse matrix techniques may help here.
Steve Denham
Well, there are about 43,000 zip codes in the US... However, statistical analyses based on ZIP codes has problems, which is one reason that the US government defined Metropolitan Statistical Areas (MSAs) and Core-Based Statistical Areas (CBSAs):
List of United States core based statistical areas - Wikipedia, the free encyclopedia
Are you including a large number of interaction terms? I've never seen this error, but I wonder if it could occur by not truncating the interactions. For example:
proc glm data=GLMData;
class c1-c10;
model y = c1|c2|c3|c4|c5|c6|c7|c8|c9|c10;
run;
It is true that GLM will not allow over 32767 levels. I just tried it on my 64 bit version of 9.3 (simulated 60000 levels). You also get an error when trying to use MIXED. However, HPMIXED runs just fine (same syntax as MIXED, and similar to GLM), but this takes a while. In fact, HPMIXED was designed for problems like this. Check the documentation.
Good to know that my hunch pays off when somebody who knows actually checks it out.
Steve Denham
HPMAX worked!
Now I am going to push the limit further by trying to perform a 4 equation 3 stage least squares estimate (3SLS) on these data.
Thanks!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.