I am going through an exercise compressing all our datasets (around 600). I am doing this with a simple:
data libname.dsname_tmp;
set libname.dsname;
run;
---- compare datasets with proc compare and manual look at obs numbers -----
if OK then:
proc datasets
– delete dsname ;
– change dsname_tmp = dsname;
I've put a lot of checks and balances in as well - like does the temp file exist already, or is the file already compressed etc..
I was trying to retain the sort order data, which is returned by contents sorted and sortedby, by adding a by statement into the set statement. These values are not returned by the newly created dataset. Obviously the data order does not change with a simple set statement, but what impact does this have when SAS determines when a SORT is needed. Does SAS look at these values (a), or does it actually read through the data (b) to determine whether a sort is needed.
If it’s (b) then no problem, but if it's (a), sorting this number of files - some are many millions of obs - would be a huge overhead. Or will the sort be quick, considering the data is sorted already?
There is a dataset option called "sortedby" but this can be manually set, and thus will not be a solution if (a) is applicable.
So which is it (a) or (b)?
... View more