Re: Insufficient memory using PROC SURVEYREG

braam · Posted 06-15-2020 09:50 AM

Dear All,

I face an insufficient memory error when I excute PROC SURVEYREG using a quite large sample (N >= 200,000).

I have two sets of fixed effects, but the numbers of indicator variables included (i.e., fixed effects) are not so large: One with just three and the other with about 30.

I also include two sets of clustering variables, each of which has about 2,500 clusters. When the two cluster variables are considered together, there are about 100,000 clusters.

The message that I've got is: "ERROR: The SAS System stopped processing this step because of insufficient memory."

It has been discussed in a previous forum, but I never saw a solution. I'm wondering if there is a way for me to overcome this issue. Just to show how I coded, I provide my SAS code using sashelp.cars dataset.


proc surveyreg data= sashelp.cars;
	class Make DriveTrain; 
	cluster type origin;
	model Weight= Wheelbase Length Make DriveTrain/ solution ;
	run;

SASKiwi · Posted 06-15-2020 04:08 PM

If you remove the clusters does your program work without error?

Run your program with the FULLSTIMER SAS option to report how much memory your program is using. Try this without the CLUSTER statement to see if you can get your program to at least complete in simplified form, then post your SAS log.

braam · Posted 06-15-2020 04:23 PM

You can find the memory information below when I drop the cluster variable.

NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 1.50 seconds
user cpu time 1.26 seconds
system cpu time 0.21 seconds
memory 1528.04k
OS Memory 37120.00k
Timestamp 06/15/2020 10:20:04 PM
Step Count 15141 Switch Count 0

Also please see below when I drop one fixed effect (class variable) while including the cluster variable.

NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 2.88 seconds
user cpu time 2.01 seconds
system cpu time 0.84 seconds
memory 1986171.01k
OS Memory 2026220.00k
Timestamp 06/15/2020 10:20:16 PM
Step Count 15142 Switch Count 0

Again, my code doesn't work when I include fixed effects and cluster variables I would like to have.

Just so you know, I already amended my sas config file so that memsize is 4GB (it doesn't work even if memsize is 8GB).

Group=MEMORY
SORTSIZE=1073741824
Specifies the amount of memory that is available to the SORT procedure.
SUMSIZE=9223372036854775807
Specifies a limit on the amount of memory that is available for data summarization
procedures when class variables are active.
MAXMEMQUERY=2147483647
Specifies the maximum amount of memory that is allocated for procedures.
MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
MEMMAXSZ=2147483648
Specifies the maximum amount of memory to allocate for using memory-based
libraries.
LOADMEMSIZE=0 Specifies a suggested amount of memory that is needed for executable programs
loaded by SAS.
MEMSIZE=4294967296
Specifies the limit on the amount of virtual memory that can be used during a SAS
session.
REALMEMSIZE=0 Specifies the amount of real memory SAS can expect to allocate.

SASKiwi · Posted 06-15-2020 05:59 PM

Are you using SAS on a PC or a remote SAS server? If running on your PC is SAS 64-bit or 32-bit and how much physical memory do you have? If you are pushing the boundary of your physical memory you really have no choice but to simplify your model or reduce your sample size.

ballardw · Posted 06-15-2020 04:38 PM

You probably should provide the actual code, and better is to copy the submitted code with all messages from the log.

The online help shows how to estimate the memory needed;

The memory needed by the SURVEYREG procedure to handle the survey design is described as follows.

Let

H be the total number of strata

nc be the total number of clusters in your sample across all H strata, if you specify a CLUSTER statement

p be the total number of parameters in the model

The memory needed (in bytes) is

48H + 8pH +4p(p+1)H

For a cluster sample, the additional memory needed (in bytes) is

48H + 8pH +4p(p+1)H+4p(p+1)nc + 16nc

The SURVEYREG procedure also uses other small amounts of additional memory. However, when you have a large number of clusters or strata, or a large number of parameters in your model, the memory described previously dominates the total memory required by the procedure.

So your 100,000 clusters is likely eating a whole lot of memory.

braam · Posted 06-15-2020 04:58 PM

Thanks for your reply. Here is the code. &Controls. includes a set of control variables I included.


proc surveyreg data= Inp;
	cluster group;
	class fe1 fe2;
	model diff= same &controls. fe1 fe2/ solution;
	run;

NOTE: Writing HTML Body file: sashtml26.htm
NOTE: The input data set is subset by WHERE, OBS, or FIRSTOBS. This provides a completely separate
analysis of the subset. It does not provide a statistically valid subpopulation or domain
analysis, where the total number of units in the subpopulation is not known with certainty. If
you want a domain analysis, you should include the DOMAIN variables in a DOMAIN statement.
NOTE: In data set INP, total 197985 observations read, 14163 observations with missing values are
omitted.
ERROR: The SAS System stopped processing this step because of insufficient memory.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 2.01 seconds
cpu time 1.85 seconds

ballardw · Posted 06-15-2020 06:53 PM

@braam wrote:

Thanks for your reply. Here is the code. &Controls. includes a set of control variables I included.
proc surveyreg data= Inp;
	cluster group;
	class fe1 fe2;
	model diff= same &controls. fe1 fe2/ solution;
	run;
NOTE: Writing HTML Body file: sashtml26.htm
NOTE: The input data set is subset by WHERE, OBS, or FIRSTOBS. This provides a completely separate
analysis of the subset. It does not provide a statistically valid subpopulation or domain
analysis, where the total number of units in the subpopulation is not known with certainty. If
you want a domain analysis, you should include the DOMAIN variables in a DOMAIN statement.
NOTE: In data set INP, total 197985 observations read, 14163 observations with missing values are
omitted.
ERROR: The SAS System stopped processing this step because of insufficient memory.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 2.01 seconds
cpu time 1.85 seconds

It is pretty obvious from the log that you submitted different code. This is tattling on you:

NOTE: The input data set is subset by WHERE, OBS, or FIRSTOBS. This provides a completely separate analysis of the subset.

You posted code did not include either a Where, Obs or Firstobs data set option or Where statement. And just how many variables are you hiding in that macro variable?

Also if you actually have 100,000 clusters you don't have enough data to support any sort of analysis. You have fewer than 200,000 observations which means some (or many) of 100,000 clusters would consist of a single observation. There is not going to be much "regression" of anysort going on in any cluster with one observation. And with 2 class variables you may have many of the "class combinations" with only one record.

But at this point I am starting to think you are not quite clear on the difference of cluster and class variable roles.

Without a WEIGHT variable I really wonder why you are looking at Surveyreg at all. I am not seeing much of a description of the sample design and would be extremely surprised if the weight for every single cluster should be 1.

braam · Posted 06-16-2020 08:51 AM

I'm sorry for my bad. I just switched the code a bit for better readability, and I excluded the where statement because I thought that it was not necessary. The number of clusters I have is around 70,000 with about 200,000 observations. For some clusters, there is just one observation, but there are mostly 2-4 observations per cluster. Regarding the control variable, there are just about 10 control variables. Class variables are included to absorb fixed effects.

SteveDenham · Posted 06-16-2020 03:19 PM

This is a bit mysterious now. Does SURVEYREG ridge the matrix? That could help.

SteveDenham

braam · Posted 06-19-2020 05:24 AM

OP here just to share what I've got from SAS. The answer from SAS is:

"Unfortunately, there is no way to increase the maximum amount of memory that PROC SURVEYREG is allowed to use. The only way to get this working is to reduce the size of your data or use less variables in the PROC SURVEYREG."

Certainly, there is no solution for this. One way I figured out myself is to use regression with demeaning (by using PROC STDIZE), instead of including fixed effects by class statement. This way, I can use the memory saved by not using class statement for clustering.

TonyAn · Posted 07-07-2020 04:24 PM

Note that proc surveyreg is specifically designed for survey data analysis that taking in survey design information in order to estimate the variance correctly. If you just want the sandwich variance estimation, consider to use proc sandwich:

https://go.documentation.sas.com/?docsetId=casstat&docsetVersion=8.5&docsetTarget=casstat_sandwich_o...

Here is the code for running proc sandwich in CAS:

*** Setup for running cas;
*** You need to replace Your_CAS_Host with your your port number from your cas server;
options host="Your_CAS_Host" port=your_port_number;
*** Start the CAS server session;
cas mycassession;

*** Name a libname to refer to the CAS session to be used by cas engine; 
libname mycaslib cas sessref=mycassession;

**** load the data set to the CAS lib; 
data mycaslib.cars;
set sashelp.cars;
run;

*** Check on the data ***;
proc contents data=mycaslib.cars;

*** Run the analysis;
proc sandwich data= mycaslib.cars;
    class Make DriveTrain type origin;
    cluster type origin;
    model Weight= Wheelbase Length Make DriveTrain ;
run;

And the results:

                                The SAS System                               1

                            The CONTENTS Procedure

   Data Set Name        MYCASLIB.CARS             Observations          428
   Member Type          DATA                      Variables             15 
   Engine               SASIOCA                   Indexes               0  
   Created              DDMMMYY:00:00:00          Observation Length    160
   Last Modified        DDMMMYY:00:00:00          Deleted Observations  0  
   Protection                                     Compressed            NO 
   Data Set Type                                  Sorted                NO 
   Label                                                                   
   Data Representation  Native                                             
   Encoding             utf-8  Unicode (UTF-8)                             


                 Alphabetic List of Variables and Attributes
 
        #    Variable       Type    Len    Format      Label

        9    Cylinders      Num       8                               
        5    DriveTrain     Char      5                               
        8    EngineSize     Num       8                Engine Size (L)
       10    Horsepower     Num       8                               
        7    Invoice        Num       8    DOLLAR8.                   
       15    Length         Num       8                Length (IN)    
       11    MPG_City       Num       8                MPG (City)     
       12    MPG_Highway    Num       8                MPG (Highway)  
        6    MSRP           Num       8    DOLLAR8.                   
        1    Make           Char     13                               
        2    Model          Char     40                               
        4    Origin         Char      6                               
        3    Type           Char      8                               
       13    Weight         Num       8                Weight (LBS)   
       14    Wheelbase      Num       8                Wheelbase (IN) 
                                The SAS System                               2

                            The SANDWICH Procedure

                              Model Information

                        Data Source             CARS  
                        Response Variable       Weight
                        Design Matrix Method    Dense 


                 Number of Observations Read             428
                 Number of Observations Used             428


                           Class Level Information
 
Class       Levels  Values

Make            38  Acura Audi BMW Buick Cadillac Chevrolet Chrysler Dodge    
                    Ford GMC Honda Hummer Hyundai Infiniti Isuzu Jaguar Jeep  
                    Kia Land Rover Lexus Lincoln MINI Mazda Mercedes-Benz     
                    Mercury Mitsubishi Nissan Oldsmobile Pontiac Porsche Saab 
                    Saturn Scion Subaru Suzuki Toyota Volkswagen Volvo        
DriveTrain       3  All Front Rear                                            
Type             6  Hybrid SUV Sedan Sports Truck Wagon                       
Origin           3  Asia Europe USA                                           


                                  Dimensions

                       Number of Effects              5
                       Number of Parameters          44
                       Number of Clusters            15


                               Fit statistics

                           Root MSE      392.77231
                           R-Square        0.75791
                           Adj R-Sq        0.73220


                             Analysis of Variance
 
                                   Sum of           Mean
 Source                 DF        Squares         Square    F Value    Pr > F

 Model                  41      186427453        4547011      29.47    <.0001
 Error                 386       59548254         154270                     
 Corrected Total       427      245975707                                    


                             Parameter Estimates
 
                                               Standard
  Parameter            DF       Estimate          Error   t Value   Pr > |t|

  Intercept             1   -3583.655610     740.718394     -4.84     0.0003
  Wheelbase             1      41.498469      13.290771      3.12     0.0075
  Length                1      14.058792       5.402989      2.60     0.0209
  Make Acura            1     105.160317     111.068272      0.95     0.3598
  Make Audi             1     101.069236      73.718460      1.37     0.1919
  Make BMW              1      22.187616     129.854720      0.17     0.8668
  Make Buick            1     -27.109440     136.679951     -0.20     0.8456
  Make Cadillac         1     358.988128     246.507040      1.46     0.1674
  Make Chevrolet        1    -102.195631     120.094358     -0.85     0.4091
  Make Chrysler         1       8.955864     104.604424      0.09     0.9330
  Make Dodge            1    -203.298334     121.679502     -1.67     0.1170
  Make Ford             1     -16.599543     172.650946     -0.10     0.9248
  Make GMC              1      59.717873     399.228132      0.15     0.8832
  Make Honda            1    -170.596334      80.175969     -2.13     0.0516
  Make Hummer           1    1729.888145     219.739957      7.87     <.0001
  Make Hyundai          1    -135.490576     119.216053     -1.14     0.2748
  Make Infiniti         1     -83.240401     150.865548     -0.55     0.5898
  Make Isuzu            1     210.694202     116.788893      1.80     0.0928
  Make Jaguar           1      61.797671     313.378887      0.20     0.8465
  Make Jeep             1     473.288866      75.295723      6.29     <.0001
  Make Kia              1      62.443050     123.399072      0.51     0.6207
  Make Land Rover       1     671.654044      93.161553      7.21     <.0001
  Make Lexus            1     276.619911     216.529010      1.28     0.2222
  Make Lincoln          1     158.507690     396.029409      0.40     0.6950
  Make MINI             1     250.063880     127.902939      1.96     0.0708
  Make Mazda            1    -344.072796     179.621752     -1.92     0.0761
  Make Mercedes-Benz    1     267.497565     246.050132      1.09     0.2953
  Make Mercury          1     -31.290297     167.103856     -0.19     0.8542
  Make Mitsubishi       1      37.966270     132.398771      0.29     0.7785
  Make Nissan           1     -76.744072     220.619276     -0.35     0.7331
  Make Oldsmobile       1    -296.579330      65.066526     -4.56     0.0004
  Make Pontiac          1    -134.546208     102.736482     -1.31     0.2114
  Make Porsche          1     313.121973     200.170777      1.56     0.1401
  Make Saab             1     147.237775      84.162625      1.75     0.1021
  Make Saturn           1    -421.343815      67.560360     -6.24     <.0001
  Make Scion            1     -60.835128     122.365802     -0.50     0.6268
  Make Subaru           1    -351.737430      98.914124     -3.56     0.0032
  Make Suzuki           1    -151.033955      79.036316     -1.91     0.0767
  Make Toyota           1    -183.520589     153.059537     -1.20     0.2504
  Make Volkswagen       1     447.532970      48.800068      9.17     <.0001
  Make Volvo            0              0              .       .        .    
  DriveTrain All        1     478.285247     158.278093      3.02     0.0091
  DriveTrain Front      1    -108.196455     121.174550     -0.89     0.3870
  DriveTrain Rear       0              0              .       .        .    

                The degrees of freedom for the t tests is 14.


                                 Task Timing
 
              Task                         Seconds      Percent

              Setup and Parsing               0.92      21.66% 
              Levelization                    0.22       5.25% 
              Model Initialization            0.02       0.47% 
              SSCP Computation                0.05       1.10% 
              Model Fitting                   0.01       0.25% 
              Post Fitting Processing         0.00       0.00% 
              Cleanup                         3.04      71.26% 
              Total                           4.27     100.00%

Ready to join fellow brilliant minds for the SAS Hackathon?