Programming the statistical procedures from SAS

Calculating clustered standard errors with a large number of clusters

Reply
Established User
Posts: 1

Calculating clustered standard errors with a large number of clusters

[ Edited ]

Hi,

 

I am running a pretty large data set for 500,000 individuals on a daily level over 13 years. I have approximately 500 million observations.

I have a problem constructing clustered standard errors on the individual level.

It seems like I have a server size problem. I have access to 2 TB, which allows me to construct clustered standard errors for up to 60,000 individuals.

It seems like proc surveyreg can run with fewer clusters than proc genmod.

 

Do anyone have an idea to how to overcome this or what to try?

 

Best,

Frederik

 

I get this error message with proc surveyreg:

ERROR: The SAS System stopped processing this step because of insufficient
memory.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 4:34:52.34
cpu time 4:34:23.42

 

This is my code, if I run with proc surveyreg: Attached.

 

Super User
Posts: 13,508

Re: Calculating clustered standard errors with a large number of clusters

Posted in reply to FrederikPL

It never hurts to show the code you are using. That way we can avoid making suggestions that look like what you are doing.

 

What exactly is the problem? No output, incorrect (or at least unexpected output), missing errors for some records?

 

From the surveyreg documentation

Let

  • H be the total number of strata

  • $n_ c$ be the total number of clusters in your sample across all H strata, if you specify a CLUSTER statement

  • p be the total number of parameters in the model

The memory needed (in bytes) is

\[ 48H+8pH+4p(p+1)H \]

For a cluster sample, the additional memory needed (in bytes) is

\[ 48H+8pH+4p(p+1)H+ 4p(p+1)n_ c + 16n_ c \]

The SURVEYREG procedure also uses other small amounts of additional memory. However, when you have a large number of clusters or strata, or a large number of parameters in your model, the memory described previously dominates the total memory required by the procedure.

So using the above information does the memory requirement come close to being within your available.

 

Also there is the consideration of the output. ODS Select or Exclude might reduce some of the output table overhead.

 

Ask a Question
Discussion stats
  • 1 reply
  • 91 views
  • 0 likes
  • 2 in conversation