03-17-2017 09:38 AM
Does anyone have the COMPRESS=YES as the default for their SAS users? We still have available CPU cycles but we get very low on our very pricey disk space and its not possible to get all users to put the OPTIONS COMPRESS=YES in their code. Are there any concerns with doing this?
03-17-2017 09:42 AM
I have it set in my grid environment and have no complaints. Though most of my users came from stand-alone Windows 2008 servers, so the hardware alone is much more powerful.
The power users typically process datasets in the 500GB range (after compression) and seem fine with it.
I don't have any real tests or data to proceed, so may not be too helpful.
03-17-2017 10:45 PM
Setting COMPRESS = YES or even COMPRESS = BINARY (for even better compression) by default is common practice on both GRID and non-GRID SAS servers in my experience. As long as you are not CPU-bound jobs should perform better as IO is the most common bottleneck.
03-18-2017 07:59 AM
if the concern is disk space and there is remaining core power, you should be fine with the COMPRESS option enabled, as @SASKiwi and @Timmy2383 suggest. The only consideration I would take would be to keep an eye on your SASWork and Utilloc usage. I would expect that the Utilloc would be more in use after you enable the option.
Personal suggestion: try to set it up on a separated SAS Application server, for some key users to test. And if you all agree you can generalize the change to all the Grid environment.
A question: what do you mean with " its not possible to get all users to put the OPTIONS COMPRESS=YES in their code" ?
03-20-2017 08:24 AM
Regarding the question about getting users to add the OPTIONS COMPRESS=YES, what I mean is that I can send out an email to all users but only a portion of them will take action which is why I was interested in making the compress=yes a default.
I was wondering why the SASWork and Utilloc usage would go up? I was hoping that the COMPRESS option might free up room there too by compressing WORK datasets as well. Our WORK volume gets heavily used for large amounts of data since WORK is a 10TB flash drive and much faster than DASD and has more free space available most of the time than some teams have on their DASD volumes. Several times a week there are multiple users using terabytes of WORK storage and I have to kill the jobs of the largest consumer to avoid being 100%.
03-20-2017 08:47 AM
UTILLOC (or WORK) usage would go "up" because utility files are not compressed. So if users are able to store more data because of the compress option, those intermediate files would grow (unexepectedly).
With large datasets that are stored with compress=yes, I often have to use the tagsort option in proc sort, or the utility file would fill my UTILLOC.
03-20-2017 09:12 AM
Second what @JuanS_OCS said.
Quotas on collectively used resources are a must. Modern operating systems provide quota classes, so you only have to enable quotas and set the default class to a reasonable value (and define a separate unlimited class for users that need it, eg the user used to run batch jobs).
According to needs, define additional classes and assign groups or individual users to them. After that, you'll rarely experience a show-stopper because of WORK overflow anymore.
Never trust user's discipline. That's a non-existent entity.
03-20-2017 03:11 PM
Are you running the SAS CLEANWORK utility daily to remove orphaned WORK folders? If not then then I would highly recommend you do this.
03-20-2017 03:50 PM
Thanks for mentioning that but yes the CLEANWORK utility runs hourly on each server via crontab. Also on a daily basis will kill any SAS bjobs that are older than 5 days which has helped a lot.
03-20-2017 08:54 AM - edited 03-20-2017 08:56 AM
I see your problem. Bad queries are a daily challenge for both the SAS users and the SAS admins. THe general solution to that problem is two-headed:
Whilst it is a challenge and easy to debate the creation of quotas for each individual user, you can at least ensure the resources for the non-interactive SAS processes (SASBatch, SASStored Processes, SAS Pooled Workspace Server, etc) by creating 2 quotas, one for non-interactive sas sessions (the ones you are in control of) and one for the interactive sessions (generally, SAS Workspace Server only).
Next level would be to create quotas/SAS Application Servers for different teams, and so on. You can drill-down a lot on the customizations and control of your IT resources. But only the first approach to secure non-interactive SAS sessions from the interactive ones, it should give you a reasonable amount of internal peace
03-20-2017 03:55 PM - edited 03-21-2017 07:53 AM
Earlier the datasets I did the COMPRESS tests on had a high level of compression but I ran COMPRESS on a dataset with a lot of binary data and the output COMPRESSed dataset became 90G instead of 72G and it took quite a while longer to output it than the uncompressed dataset. I had seen earlier in a paper that SAS would not compress a dataset that would be bigger than the uncompressed but when I went back and looked at the paper, it was quite old (SAS 8). I'm using SAS 9.4 M3.