Dear All,
I face an insufficient memory error when I excute PROC SURVEYREG using a quite large sample (N >= 200,000).
I have two sets of fixed effects, but the numbers of indicator variables included (i.e., fixed effects) are not so large: One with just three and the other with about 30.
I also include two sets of clustering variables, each of which has about 2,500 clusters. When the two cluster variables are considered together, there are about 100,000 clusters.
The message that I've got is: "ERROR: The SAS System stopped processing this step because of insufficient memory."
It has been discussed in a previous forum, but I never saw a solution. I'm wondering if there is a way for me to overcome this issue. Just to show how I coded, I provide my SAS code using sashelp.cars dataset.
proc surveyreg data= sashelp.cars;
class Make DriveTrain;
cluster type origin;
model Weight= Wheelbase Length Make DriveTrain/ solution ;
run;
If you remove the clusters does your program work without error?
Run your program with the FULLSTIMER SAS option to report how much memory your program is using. Try this without the CLUSTER statement to see if you can get your program to at least complete in simplified form, then post your SAS log.
You can find the memory information below when I drop the cluster variable.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 1.50 seconds
user cpu time 1.26 seconds
system cpu time 0.21 seconds
memory 1528.04k
OS Memory 37120.00k
Timestamp 06/15/2020 10:20:04 PM
Step Count 15141 Switch Count 0
Also please see below when I drop one fixed effect (class variable) while including the cluster variable.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 2.88 seconds
user cpu time 2.01 seconds
system cpu time 0.84 seconds
memory 1986171.01k
OS Memory 2026220.00k
Timestamp 06/15/2020 10:20:16 PM
Step Count 15142 Switch Count 0
Again, my code doesn't work when I include fixed effects and cluster variables I would like to have.
Just so you know, I already amended my sas config file so that memsize is 4GB (it doesn't work even if memsize is 8GB).
Group=MEMORY
SORTSIZE=1073741824
Specifies the amount of memory that is available to the SORT procedure.
SUMSIZE=9223372036854775807
Specifies a limit on the amount of memory that is available for data summarization
procedures when class variables are active.
MAXMEMQUERY=2147483647
Specifies the maximum amount of memory that is allocated for procedures.
MEMBLKSZ=16777216 Specifies the memory block size for Windows memory-based libraries.
MEMMAXSZ=2147483648
Specifies the maximum amount of memory to allocate for using memory-based
libraries.
LOADMEMSIZE=0 Specifies a suggested amount of memory that is needed for executable programs
loaded by SAS.
MEMSIZE=4294967296
Specifies the limit on the amount of virtual memory that can be used during a SAS
session.
REALMEMSIZE=0 Specifies the amount of real memory SAS can expect to allocate.
Are you using SAS on a PC or a remote SAS server? If running on your PC is SAS 64-bit or 32-bit and how much physical memory do you have? If you are pushing the boundary of your physical memory you really have no choice but to simplify your model or reduce your sample size.
You probably should provide the actual code, and better is to copy the submitted code with all messages from the log.
The online help shows how to estimate the memory needed;
The memory needed by the SURVEYREG procedure to handle the survey design is described as follows.
Let
H be the total number of strata
- nc be the total number of clusters in your sample across all H strata, if you specify a CLUSTER statement
p be the total number of parameters in the model
The memory needed (in bytes) is
48H + 8pH +4p(p+1)H
For a cluster sample, the additional memory needed (in bytes) is
48H + 8pH +4p(p+1)H+4p(p+1)nc + 16nc
The SURVEYREG procedure also uses other small amounts of additional memory. However, when you have a large number of clusters or strata, or a large number of parameters in your model, the memory described previously dominates the total memory required by the procedure.
So your 100,000 clusters is likely eating a whole lot of memory.
Thanks for your reply. Here is the code. &Controls. includes a set of control variables I included.
proc surveyreg data= Inp;
cluster group;
class fe1 fe2;
model diff= same &controls. fe1 fe2/ solution;
run;
NOTE: Writing HTML Body file: sashtml26.htm
NOTE: The input data set is subset by WHERE, OBS, or FIRSTOBS. This provides a completely separate
analysis of the subset. It does not provide a statistically valid subpopulation or domain
analysis, where the total number of units in the subpopulation is not known with certainty. If
you want a domain analysis, you should include the DOMAIN variables in a DOMAIN statement.
NOTE: In data set INP, total 197985 observations read, 14163 observations with missing values are
omitted.
ERROR: The SAS System stopped processing this step because of insufficient memory.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 2.01 seconds
cpu time 1.85 seconds
@braam wrote:
Thanks for your reply. Here is the code. &Controls. includes a set of control variables I included.
proc surveyreg data= Inp; cluster group; class fe1 fe2; model diff= same &controls. fe1 fe2/ solution; run;
NOTE: Writing HTML Body file: sashtml26.htm
NOTE: The input data set is subset by WHERE, OBS, or FIRSTOBS. This provides a completely separate
analysis of the subset. It does not provide a statistically valid subpopulation or domain
analysis, where the total number of units in the subpopulation is not known with certainty. If
you want a domain analysis, you should include the DOMAIN variables in a DOMAIN statement.
NOTE: In data set INP, total 197985 observations read, 14163 observations with missing values are
omitted.
ERROR: The SAS System stopped processing this step because of insufficient memory.
NOTE: PROCEDURE SURVEYREG used (Total process time):
real time 2.01 seconds
cpu time 1.85 seconds
It is pretty obvious from the log that you submitted different code. This is tattling on you:
NOTE: The input data set is subset by WHERE, OBS, or FIRSTOBS. This provides a completely separate analysis of the subset.
You posted code did not include either a Where, Obs or Firstobs data set option or Where statement. And just how many variables are you hiding in that macro variable?
Also if you actually have 100,000 clusters you don't have enough data to support any sort of analysis. You have fewer than 200,000 observations which means some (or many) of 100,000 clusters would consist of a single observation. There is not going to be much "regression" of anysort going on in any cluster with one observation. And with 2 class variables you may have many of the "class combinations" with only one record.
But at this point I am starting to think you are not quite clear on the difference of cluster and class variable roles.
Without a WEIGHT variable I really wonder why you are looking at Surveyreg at all. I am not seeing much of a description of the sample design and would be extremely surprised if the weight for every single cluster should be 1.
I'm sorry for my bad. I just switched the code a bit for better readability, and I excluded the where statement because I thought that it was not necessary. The number of clusters I have is around 70,000 with about 200,000 observations. For some clusters, there is just one observation, but there are mostly 2-4 observations per cluster. Regarding the control variable, there are just about 10 control variables. Class variables are included to absorb fixed effects.
This is a bit mysterious now. Does SURVEYREG ridge the matrix? That could help.
SteveDenham
OP here just to share what I've got from SAS. The answer from SAS is:
"Unfortunately, there is no way to increase the maximum amount of memory that PROC SURVEYREG is allowed to use. The only way to get this working is to reduce the size of your data or use less variables in the PROC SURVEYREG."
Certainly, there is no solution for this. One way I figured out myself is to use regression with demeaning (by using PROC STDIZE), instead of including fixed effects by class statement. This way, I can use the memory saved by not using class statement for clustering.
Note that proc surveyreg is specifically designed for survey data analysis that taking in survey design information in order to estimate the variance correctly. If you just want the sandwich variance estimation, consider to use proc sandwich:
Here is the code for running proc sandwich in CAS:
*** Setup for running cas;
*** You need to replace Your_CAS_Host with your your port number from your cas server;
options host="Your_CAS_Host" port=your_port_number;
*** Start the CAS server session;
cas mycassession;
*** Name a libname to refer to the CAS session to be used by cas engine;
libname mycaslib cas sessref=mycassession;
**** load the data set to the CAS lib;
data mycaslib.cars;
set sashelp.cars;
run;
*** Check on the data ***;
proc contents data=mycaslib.cars;
*** Run the analysis;
proc sandwich data= mycaslib.cars;
class Make DriveTrain type origin;
cluster type origin;
model Weight= Wheelbase Length Make DriveTrain ;
run;
And the results:
The SAS System 1
The CONTENTS Procedure
Data Set Name MYCASLIB.CARS Observations 428
Member Type DATA Variables 15
Engine SASIOCA Indexes 0
Created DDMMMYY:00:00:00 Observation Length 160
Last Modified DDMMMYY:00:00:00 Deleted Observations 0
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation Native
Encoding utf-8 Unicode (UTF-8)
Alphabetic List of Variables and Attributes
# Variable Type Len Format Label
9 Cylinders Num 8
5 DriveTrain Char 5
8 EngineSize Num 8 Engine Size (L)
10 Horsepower Num 8
7 Invoice Num 8 DOLLAR8.
15 Length Num 8 Length (IN)
11 MPG_City Num 8 MPG (City)
12 MPG_Highway Num 8 MPG (Highway)
6 MSRP Num 8 DOLLAR8.
1 Make Char 13
2 Model Char 40
4 Origin Char 6
3 Type Char 8
13 Weight Num 8 Weight (LBS)
14 Wheelbase Num 8 Wheelbase (IN)
The SAS System 2
The SANDWICH Procedure
Model Information
Data Source CARS
Response Variable Weight
Design Matrix Method Dense
Number of Observations Read 428
Number of Observations Used 428
Class Level Information
Class Levels Values
Make 38 Acura Audi BMW Buick Cadillac Chevrolet Chrysler Dodge
Ford GMC Honda Hummer Hyundai Infiniti Isuzu Jaguar Jeep
Kia Land Rover Lexus Lincoln MINI Mazda Mercedes-Benz
Mercury Mitsubishi Nissan Oldsmobile Pontiac Porsche Saab
Saturn Scion Subaru Suzuki Toyota Volkswagen Volvo
DriveTrain 3 All Front Rear
Type 6 Hybrid SUV Sedan Sports Truck Wagon
Origin 3 Asia Europe USA
Dimensions
Number of Effects 5
Number of Parameters 44
Number of Clusters 15
Fit statistics
Root MSE 392.77231
R-Square 0.75791
Adj R-Sq 0.73220
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 41 186427453 4547011 29.47 <.0001
Error 386 59548254 154270
Corrected Total 427 245975707
Parameter Estimates
Standard
Parameter DF Estimate Error t Value Pr > |t|
Intercept 1 -3583.655610 740.718394 -4.84 0.0003
Wheelbase 1 41.498469 13.290771 3.12 0.0075
Length 1 14.058792 5.402989 2.60 0.0209
Make Acura 1 105.160317 111.068272 0.95 0.3598
Make Audi 1 101.069236 73.718460 1.37 0.1919
Make BMW 1 22.187616 129.854720 0.17 0.8668
Make Buick 1 -27.109440 136.679951 -0.20 0.8456
Make Cadillac 1 358.988128 246.507040 1.46 0.1674
Make Chevrolet 1 -102.195631 120.094358 -0.85 0.4091
Make Chrysler 1 8.955864 104.604424 0.09 0.9330
Make Dodge 1 -203.298334 121.679502 -1.67 0.1170
Make Ford 1 -16.599543 172.650946 -0.10 0.9248
Make GMC 1 59.717873 399.228132 0.15 0.8832
Make Honda 1 -170.596334 80.175969 -2.13 0.0516
Make Hummer 1 1729.888145 219.739957 7.87 <.0001
Make Hyundai 1 -135.490576 119.216053 -1.14 0.2748
Make Infiniti 1 -83.240401 150.865548 -0.55 0.5898
Make Isuzu 1 210.694202 116.788893 1.80 0.0928
Make Jaguar 1 61.797671 313.378887 0.20 0.8465
Make Jeep 1 473.288866 75.295723 6.29 <.0001
Make Kia 1 62.443050 123.399072 0.51 0.6207
Make Land Rover 1 671.654044 93.161553 7.21 <.0001
Make Lexus 1 276.619911 216.529010 1.28 0.2222
Make Lincoln 1 158.507690 396.029409 0.40 0.6950
Make MINI 1 250.063880 127.902939 1.96 0.0708
Make Mazda 1 -344.072796 179.621752 -1.92 0.0761
Make Mercedes-Benz 1 267.497565 246.050132 1.09 0.2953
Make Mercury 1 -31.290297 167.103856 -0.19 0.8542
Make Mitsubishi 1 37.966270 132.398771 0.29 0.7785
Make Nissan 1 -76.744072 220.619276 -0.35 0.7331
Make Oldsmobile 1 -296.579330 65.066526 -4.56 0.0004
Make Pontiac 1 -134.546208 102.736482 -1.31 0.2114
Make Porsche 1 313.121973 200.170777 1.56 0.1401
Make Saab 1 147.237775 84.162625 1.75 0.1021
Make Saturn 1 -421.343815 67.560360 -6.24 <.0001
Make Scion 1 -60.835128 122.365802 -0.50 0.6268
Make Subaru 1 -351.737430 98.914124 -3.56 0.0032
Make Suzuki 1 -151.033955 79.036316 -1.91 0.0767
Make Toyota 1 -183.520589 153.059537 -1.20 0.2504
Make Volkswagen 1 447.532970 48.800068 9.17 <.0001
Make Volvo 0 0 . . .
DriveTrain All 1 478.285247 158.278093 3.02 0.0091
DriveTrain Front 1 -108.196455 121.174550 -0.89 0.3870
DriveTrain Rear 0 0 . . .
The degrees of freedom for the t tests is 14.
Task Timing
Task Seconds Percent
Setup and Parsing 0.92 21.66%
Levelization 0.22 5.25%
Model Initialization 0.02 0.47%
SSCP Computation 0.05 1.10%
Model Fitting 0.01 0.25%
Post Fitting Processing 0.00 0.00%
Cleanup 3.04 71.26%
Total 4.27 100.00%
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.