11-09-2015 03:53 PM
I have run into memory issues using Proc Mix. I cannot include as many random effects as I would like without getting a message about insufficient memory. I am using SAS 9.4 and have been using the memory=MAX command which has given me some more memory but not enough. I would like to buy a new computer that is more powerful but am unsure what to buy. My current computer is a Dell 64 bit operating system with Intel(R) Core(TM) i7 CPU, firstname.lastname@example.orgGHZ with 12.0 GB of RAM. It has Windows 7 Professional. Will it help to buy a more powerful computer? I would like to know what others are using and if they run into simialr memory issues.
11-09-2015 04:41 PM
I presume you mean PROC MIXED. You should use HPMIXED. This should handle your memory problems. Most of the syntax is the same, although there are far fewer options with HPMIXED.
The mixed model equations can consist of some very large matrices; inverting them takes a great deal of time and memory when there are many random effects. I highly recommend that you figure out how to use HPMIXED.
11-09-2015 05:16 PM
Yes, I also ran into an unexpected memory issue recently, with this innocuous PROC MEANS step:
proc means data=tmt mean min median; var dttm; run;
Log message: "A shortage of memory has caused the quantile computations to terminate prematurely for QMETHOD=OS. ..."
Dataset TMT had about 39.5 million observations.
This happened on a Windows 7 Pro 64-Bit workstation with an Intel(R) Xeon(TM) E5-1630v3 3.7GHz 10M CPU and 64 GB DDR4-2133 RAM. However, only about 14 GB RAM were available to SAS at that time, because I am using a RAM disk software which combines 50 GB RAM with 100 GB of the 1st 256-GB SSD to form a 150-GB hybrid RAM disk. I am curious whether the issue would still occur if using the full 64 GB of RAM, but haven't tried yet.
My first idea would have been to upgrade the RAM of your computer (if this is possible), but lvm's suggestion about PROC HPMIXED sounds very promising.
11-09-2015 05:40 PM
There are several "HP" (high performance) procedures now. Run on single machine or in distributed mode. There is no HPMEANS, but there is HPSUMMARY. This might work for your purpose of getting quantiles with very large data sets. With 40 M observations, quantiles will be difficult to get without the tricks of large-scale computing. At some point, you would need to get the distributed computing products.
For some procedures, SAS 9.4 won't allow the (non-HP) procedure to run if it will take too much time. This is frustrating. I have 9.3 and 9.4 on my desktop, and I can fit a mixed model on a large data set with 9.3 (taking many hours), but in 9.4 I just get a message that it would take too long to run.
11-10-2015 09:49 AM
Out of curiosity, I just simulated 50 million observations and determined the median and quartiles with PROC HPSUMMARY. No problem. I did this with "only" 8 GB of memory and a slow processor.Took less than 1/2 second of real time. It is important to use the P2 method of quantile estimation (approximation). The default (OS) requires internal ordering of the observations, which is a challenge with so many observations.
proc hpsummary data=a qmethod=p2; var y; output out=out q1=q1 q3=q3 median=median mean=mean; run; proc print data=out;run;
11-10-2015 10:14 AM
Good point. It is interesting that it takes about the same amount of time with MEANS as with HPSUMMARY to get quartiles (P2 option) on 50 M observations (on my desktop).
11-10-2015 12:18 PM
kbaughma's initial post reminded me of the performance-related system options. I discovered that the MEMSIZE option on my machine was still set to its default of 2147483648 (=2G), which meant that only a small portion of my 14 GB (or 64 GB after deactivating the RAM disk) had been available to SAS.
By simply setting MEMSIZE to MAX (during startup) my previously failed PROC MEANS step (see earlier post in this thread) ran without problems.
And much more: Now an ordinary PROC SUMMARY was able to cope with randomly generated 640 million observations (4.84 GB dataset) and calculated mean, min and, above all, median within less than 10 minutes -- without forcing me to resort to QMETHOD=P2 and its fluctuating results.
After this breakthrough I tried to push the limit even further and found that PROC HPSUMMARY achieved the same with 700 million observations (5.29 GB, 11 minutes, peak physical memory usage at about 53 GB), whereas PROC SUMMARY failed.
However, with 720 million obs. the old warning reappeared with either procedure.
So, the improvement by PROC HPSUMMARY over PROC SUMMARY -- in single-machine mode! -- in terms of processable numbers of observations was somewhere between 0 and 12.5 percent. There seemed to be no significant difference regarding run time. Of course, in distributed mode a completely different picture is to be expected.