06-11-2015 09:06 PM
I am working with a 7.5GB SAS file with over 43 million observations on 15 columns. I am unable to sort or perform any other kind of simple operation. I looked at but it seems the fix of adding the options user command didn't help. I'd really appreciate any help!
The code is as follows:
proc sort data=_TEMP0.QUARTERLY_INSTITUTIONAL_FILINGS out=WORK.Sort;
by CUSIP MGRNAME;
The following is the log result:
06-11-2015 11:54 PM
Sort is only limited by the physical resources versus time duration.
The message you are into is the one of the Sas work. That one could be increased to much higher values in local installations. As you are in UE version SAS has decided to have it low as it is for educational not production environments
06-12-2015 01:27 AM
SAS UE is a learning tool, created to handle practice data. What you are trying to do clearly falls into production, not learning or practice.
Look at how much disk space your VM occupies, then think that a sort needs 3 times the disk space of the original file during processing: original file, utility file, new file.
If your date file is stored with the compress=yes option, the utility file will even be larger, as it is not compressed, but holds all the data.
You could try the tagsort option, but for data this large you will probably need a real SAS installation.
06-12-2015 11:52 AM
You can try the TAGSORT option on the Proc Sort statement to reduce memory usage. But the comments about production versus learning may still apply.
06-12-2015 12:57 PM
you can try breaking the original file into, say, 50 bins, the bins are ordered. Bin1 can cover aa* to aq*, Bin2 can cover ar* to az*. Etc.
Proc sort each bin.
Then join the bins together.
I believe there is at least 1 SUG paper on this method.