BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yegen
Pyrite | Level 9

I have a large dataset with over 15,000 Fixed Effects. When I run the code below, SAS fails to produce any output indicating that there is not enough memory (I am skipping the other ods output parts to shorten my code). 

proc surveyreg data=have;
	cluster	id;
	class year id;
	model   dependent_var = independent_var year id;
    ods output ParameterEstimates = OutputStats_1 
    (where=(Parameter in ('Intercept','independent_var ')));  
quit;	

If I would not be required to cluster the standard errors at the id level, I could have simply used proc glm and absorbed the fixed effects variables. I was able to get results using the proc glm approach, but not with the proc surveryreg. I assume the proc surveyreg is a very slow approach? Is there any way to absorb the fixed effects as in proc glm? I am trying to run a panel data regression with year and id fixed effects and standard errors clustered at the id level. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

This is not really my area, but I will share a few thoughts.First, the CLUSTER variable is usually thought of as a random effect, but you are also listing it on the MODEL statement, which is for fixed effects. Although PROC SURVEYREG allows this syntax, other SAS procedures (eg, PHREG) do not.

 

If you want to model ID as a random effect, you might try PROC MIXED or HPMIXED. You can request sandwich estimators, which should be close to the estimates that you would get from SURVEYREG if it supported an ABSORB statement (which it doesn't). The big question, of course, is how many distinct levels of ID do you have, and will PROC MIXED be able to handle your data if it has many levels.

 

I hope someone more knowledgeable will have more to say. Maybe someone like @sld or @SAS_Rob might have thoughts on this issue.

View solution in original post

7 REPLIES 7
Rick_SAS
SAS Super FREQ

This is not really my area, but I will share a few thoughts.First, the CLUSTER variable is usually thought of as a random effect, but you are also listing it on the MODEL statement, which is for fixed effects. Although PROC SURVEYREG allows this syntax, other SAS procedures (eg, PHREG) do not.

 

If you want to model ID as a random effect, you might try PROC MIXED or HPMIXED. You can request sandwich estimators, which should be close to the estimates that you would get from SURVEYREG if it supported an ABSORB statement (which it doesn't). The big question, of course, is how many distinct levels of ID do you have, and will PROC MIXED be able to handle your data if it has many levels.

 

I hope someone more knowledgeable will have more to say. Maybe someone like @sld or @SAS_Rob might have thoughts on this issue.

SAS_Rob
SAS Employee

I would echo what @Rick_SASis saying as well.

 

You normally would not want to put the ID variable on the MODEL statement, especially if you are have a random sample of subjects from some population.  If you do not have a complex survey design, then there are better ways to get robust/sandwich estimators like he mentioned.

Yegen
Pyrite | Level 9

Thanks for your helpful comments, @Rick_SAS and @SAS_Rob. The lower bound for the number of distinct levels of ID is 15,000 (and the upper bound is around 170,000). I will give the suggestion you have made a try. I also had a conversation with my co-author and we thought of the following. Since fixed effects just demean the LHS and RHS variables, one can just compute the means of the given variables at the distinct ID level. Since I have two different FEs (i.e., ID and year), I computed the mean of the same variables at the year level. Following that, I just subtracted both means (i.e., corresponding ID and year means) from the corresponding variables (e.g., Y_t,i - Y_mean_i - Y_mean_t) and obtained the demeaned variables. Then, I just used PROC SURVEYREG with clustering at the id-level and voila I got the results pretty quickly. PROC SURVEYREG does not seem to like large number of fixed effects, but handles well clustering (whereas, PROC GLM handles fixed effects well, but does not have a clustering option).

Thanks again, @Rick_SAS and @SAS_Rob.


JLcra
Fluorite | Level 6

Hi @Yegen,

 

I'm currently having the same problem with a model of mine (2 million unit fixed effects and 12 time fixed effects). I'm not that knowledgeable of the behind the scenes matrix math for variances and standard error calculation though. Would your demeaning approach still produce the proper clustered standard errors/covariance matrix?

 

If it matters, I'm attempting to get 2-way clustered errors on both sets of fixed effects using a macro I've found on several academic sites that uses survey reg twice, once with each cluster, then computes the 2-way clustered errors using the covariance matricies from surveyreg. I'm wondering if demeaning will ruin that somehow.

 

Jon

Yegen
Pyrite | Level 9

Hi @JLcra,

 

Yes, that would still be fine. My co-author is a financial econometrician and he also confirmed that it would work well. Here is what I was doing. 

 

  1. Find the mean of your LHS and RHS variables by grouping at the unit level.
  2. Subtract the mean from Step 1 from each variable (e.g., subtract mean of LHS from LHS variable).
  3. Now, use the unit level demeaned variables and find the mean of these demeaned variables by grouping at the time level.
  4. As in step 2, subtract the mean from Step 3 from the values you obtained after step 2.
  5. Then, use these demeaned values in your regression that allows 2-way clustering (see link of reliable code below).

Link of reliable SAS code: http://www.people.hbs.edu/igow/GOT/Code/clus2D.sas

 

 
I hope this helps. 

Godechot
Fluorite | Level 6

You could try my Felm Macro which does it all

http://olivier.godechot.free.fr/hoparticle.php?id_art=721 .

 

Best

Olivier

 

somebody
Lapis Lazuli | Level 10

Hi, thanks for the post. I am running into the same issue. Can I have two related questions?

1. I notice that the t-stats for the coefficients from (i) the de-mean method and (ii) the PROC SURVEYREG with Class option are slightly different. Why? and how do we control for that? 

2. The link to the reliable code is missing. Do you have the new link ?

Thanks

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 6058 views
  • 4 likes
  • 6 in conversation