Hi @SASKiwi
What a small world!! A greeting to Massey alumni! Thank you for your help and suggestion so far!
Warmest regards!
Phil.
Massey Business School
Hi @Phil_NZ,
I'm confident that using the TAGSORT option, as suggested by ballardw, will help because the error message suggests that a utility file got too big.
proc sort data=bigds tagsort; by byvar1 ... byvarn; run;
If even this reduction of the space requirements was not sufficient, I would check if more disk space could be made available by changing the value of the UTILLOC= system option (i.e., specify a different or an additional location).
If all else fails, you can still consider changes to the sort process (e.g., split the large dataset, ...).
> the error message suggests that a utility file got too big.
Another option is to store the utility files in a compressed folder.
You are using Compustat data, and you apparently created an identifier variable GVKEYIID from the company id (GVKEY) and the stock issue (IID).
If your data originally came from WRDS it was originally sorted by GVKEY/DATADATE (for company data) or by GVKEY/IID/DATADATE (for stock issue data).
Of course, you may have re-ordered it in your intermediate work. But I wouldn't be surprised if your data currently is sorted by GVKEY, if not already by GVKEY/IID/DATADATE (or maybe GVKEY/DATADATE/IID).
If so you can use space-saving and time-saving techniques to convert a dataset already sorted by group to a dataset sorted by detail within group.
In particular, if is is sorted by GVKEY, you don't need a proc sort. Instead you can (a) read in all details records for one GVKEY, (b) sort the detail records in memory by GVKEYIID/DATADATE and (c) output, (d) clear the in-memory data container. Then go to the next GVKEY and repeat.
This approach would take almost NO temporary disk space (and would be much faster). Here's how the code looks.
data want (drop=_:);
set have;
by gvkey;
if _n_=1 then do;
declare hash h (dataset:'have (obs=0)',ordered:'A',multidata:'Y');
h.definekey('gvkeyiid','datadate');
h.definedata(all:'Y');
declare hiter hi ('h');
end;
h.add();
if last.gvkey;
do _rc=hi.first() by 0 until(_rc^=0);
output;
_rc=hi.next();
end;
_rc=h.clear();
run;
To check in advance whether the data are already ordered by GVKEY, just run:
data _null_;
set have;
by gvkey;
run;
If there are no error messages, then the order by gvkey is true. Obviously you could even do this by your specific variables to see if the desired final order already is in place.
Hi @mkeintz
Thank you for your specific suggestion. It seems that it is time for me to learn about the HASH objects because I also saw @Kurt_Bremser give me a lot of suggestions relating HASH OBJECT recently.
Warmest regards.
Phil
It seems that you really should start considering storing your data in binary-compressed SPDE files. Not only will they occupy (much) less space on disk, but SPDE can also avoid using proc sort at all, as it can sort its data on the fly faster than proc sort can. So you can use a BY statement in a step without having to sort first. Of course if you do that several times, sorting becomes the best option.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.