Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

Hi, I am working on a project with over 7000 employer groups during time frame of 2007-2013, and need to run a regression model which has expenditure as dependent variable, both employer group, calendar year and interaction between employer group and year as independent variables, among other independent variables. I need to treat the employer group as a fixed effect. And since each employer group has more than 1 year value, this is repeated measure. So I need to cluster the within group variance.

I started with PROC MIXED, but seems SAS is not able to run PROC MIXED with this many dummy varibales?

Then I just test SAS' capacity by using PROC GLM, SAS is able to run this many dummies for PROC GLM! however, the PROC GLM does not correct/control the correlation of repeated measures (especially the data is in univariate format, and cannot transform to multivariate format because doing so will lose other independent variables, such as year).

Thus, I am back to the choice of basic PROC SURVEYREG which allows cluster statement to control correlation of repeated measures. however, since PROC SURVEYREG does not include class statement, I am facing creating over 7000 dummy variables (already did: employer_group 1 - employer_group 7000) and include them into PROC SURVEYREG. this sounds crazy. I don't know how to easily write 7000 dummy variables into PROC SURVEYREG without having to actually write 7000 variables. Any idea? Thanks!


Accepted Solutions
Solution
‎04-16-2015 11:40 AM
Trusted Advisor
Posts: 1,503

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

PROC GLMMOD will create the dummy variables for a main effect of this categorical variable, and/or create dummy variables for interactions with this categorical variable, if you wish.

However, I am very skeptical of the idea of having a regression with 7000 dummy variables in it, because it seems to me, without seeing the data, that it is doomed to failure. That many dummy variables are going to be fitting random noise as much as they are fitting a real signal. This is called "overfitting" the model.

View solution in original post


All Replies
Occasional Contributor
Posts: 11

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

I think I figured out, just write "employer_group 1 - employer_group 7000" should workSmiley Happy

Any other comments are welcome. Thanks!

Solution
‎04-16-2015 11:40 AM
Trusted Advisor
Posts: 1,503

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

PROC GLMMOD will create the dummy variables for a main effect of this categorical variable, and/or create dummy variables for interactions with this categorical variable, if you wish.

However, I am very skeptical of the idea of having a regression with 7000 dummy variables in it, because it seems to me, without seeing the data, that it is doomed to failure. That many dummy variables are going to be fitting random noise as much as they are fitting a real signal. This is called "overfitting" the model.

Occasional Contributor
Posts: 11

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

Thank you so much! Just looked up PROC GLMMOD. sounds very helpful. Can you provide some sample codes in my case? especially how to "create dummy variables for interactions with this categorical variable"?

Trusted Advisor
Posts: 1,503

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

There is an example in the PROC GLMMOD documentation that demonstrate how it works for interactions. I don't think repeated measures fits in this framework, unless you re-parameterize the model (and I'm not sure if that's possible, I can't explain how to do that, maybe someone else can).

So my comment about "doomed to failure" is going to be ignored here? Of course, that's you're choice, but it was meant as a "red flag"

Occasional Contributor
Posts: 11

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

Hi, thanks a lot!

No, I certainly read your overfitting problem comment. I should had share my thought to that. Basically, I just leave it to the PI who leads the model design.

Occasional Contributor
Posts: 11

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

And, in PROC GLMMOD, can the procedure take care repeated measures? or how repeated measures can be later taken care in modeling steps using PROC SURVEYREG or other procedures?

Respected Advisor
Posts: 2,655

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

Proc SURVEYREG is not well designed for repeated measures, as it assumes that the residuals for the regression are NID, thus any autocorrelation is viewed as a pretty substantial violation of assumptions.  To accommodate survey weighting, see Example 44.18 Weighted Multilevel Model for Survey Data in the PROC GLIMMIX documentation (SAS/STAT13.2).  The example can be expanded to include a G side repeated measures structure.  All that you need to make it "work" is a lot of RAM that is addressable through SAS.

Steve Denham

Occasional Contributor
Posts: 11

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

Hi Steve,

Thanks for the comments and thoughts.

Btw, what is RAM stands for?

SAS Super FREQ
Posts: 3,416

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

Read-access memory (RAM) is the amount of memory that a computer can use for computations.  This is different from stored memory on disk.  Many modern computers have 8GB or 16GB of RAM.

Occasional Contributor
Posts: 11

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

I think I got it. Thank you! Smiley Happy

Occasional Contributor
Posts: 11

Re: Can I create over 7000 dummies in PROC REG or PROC SURVEYREG?

Thanks :smileygrin:

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 461 views
  • 6 likes
  • 4 in conversation