BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi @SASKiwi 

 

What a small world!! A greeting to Massey alumni! Thank you for your help and suggestion so far!

 

Warmest regards!

Phil.

Massey Business School

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
FreelanceReinh
Jade | Level 19

Hi @Phil_NZ,

 

I'm confident that using the TAGSORT option, as suggested by ballardw, will help because the error message suggests that a utility file got too big.

proc sort data=bigds tagsort;
by byvar1 ... byvarn;
run;

If even this reduction of the space requirements was not sufficient, I would check if more disk space could be made available by changing the value of the UTILLOC= system option (i.e., specify a different or an additional location).

 

If all else fails, you can still consider changes to the sort process (e.g., split the large dataset, ...).

ChrisNZ
Tourmaline | Level 20

> the error message suggests that a utility file got too big.

Another option is to store the utility files in a compressed folder.

mkeintz
PROC Star

You are using Compustat data, and you apparently created an identifier variable GVKEYIID from the company id (GVKEY) and the stock issue (IID).  

 

If your data originally came from WRDS it was originally sorted by GVKEY/DATADATE (for company data) or by GVKEY/IID/DATADATE (for stock issue data).

 

Of course, you may have re-ordered it in your intermediate work.  But I wouldn't be surprised if your data currently is sorted by GVKEY, if not already by GVKEY/IID/DATADATE (or maybe GVKEY/DATADATE/IID).

 

If so you can use space-saving and time-saving techniques to convert a dataset already sorted by group to a dataset sorted by detail within group.

 

In particular, if is is sorted by GVKEY, you don't need a proc sort.  Instead you can (a) read in all details records for one GVKEY, (b) sort the detail records in memory by GVKEYIID/DATADATE and (c) output, (d) clear the in-memory data container.  Then go to the next GVKEY and repeat.  

 

This approach would take almost NO temporary disk space (and would be much faster).  Here's how the code looks.

 

data want (drop=_:);
  set have;
  by gvkey;
  if _n_=1 then do;
    declare hash h (dataset:'have (obs=0)',ordered:'A',multidata:'Y');
     h.definekey('gvkeyiid','datadate');
     h.definedata(all:'Y');
   declare hiter hi ('h');
  end;
  h.add();
  if last.gvkey;
  do _rc=hi.first() by 0 until(_rc^=0);
    output;
    _rc=hi.next();
  end;
  _rc=h.clear();
run;

To check in advance whether the data are already ordered by GVKEY, just run:

data _null_;
  set have;
  by gvkey;
run;

If there are no error messages, then the order by gvkey is true.  Obviously you could even do this by your specific variables to see if the desired final order already is in place.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Phil_NZ
Barite | Level 11

Hi @mkeintz 

Thank you for your specific suggestion. It seems that it is time for me to learn about the HASH objects because I also saw @Kurt_Bremser  give me a lot of suggestions relating HASH OBJECT recently.

 

Warmest regards.

Phil

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
ChrisNZ
Tourmaline | Level 20

It seems that you really should start considering storing your data in binary-compressed SPDE files. Not only will they occupy (much) less space on disk, but SPDE can also avoid using proc sort at all, as it can sort its data on the fly faster than proc sort can. So you can use a BY statement in a step without having to sort first. Of course if you do that several times, sorting becomes the best option.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 3984 views
  • 21 likes
  • 8 in conversation