we're using EG4 on UNIX, SAS v9.1.3 no local SAS and it's only a couple of us admin function people that have ever actually heard of UNIX.
Recently I've discovered that a number of users have got .gz sas datasets in their user areas on UNIX. None of them know how to do this, the x command is disabled, and none of them know how it was done.
Is there any chance that EG has created/gzipped these files?
I suspect that the UNIX team has tried to be "helpful" and tidy but someone has already said they think SAS did it! If EG is zipping these files I'd like to know how and why.
1) I believe EG is actually running on windows, and simply used to access SAS on the unix box. This is the arrangement I had a few years back.
2) SAS has its own compression scheme, but it doesn't change the file names. To use compression, use the system option "options COMPRESS=yes;" Or, put the option in the sasv9.cfg file.
3) It is most probable that a shell script has gzipped the files, in the interest of conserving disk space. This, of course, kills SAS' s access to any data.
SAS is highly disk intensive in its functioning. It consumes space and generates a lot of IO. You can skimp on memory for SAS, but you cannot skimp on disk and IO. This needs to be understood by the Unix team.
SAS is a JIT compiled environment, and the compiled code is very efficient, meaning that if the system is tuned properly, when a SAS job runs, it will use at least 1 whole processor, or maybe 70% CPU + 30% IO wait, or 30% CPU + 70% IO wait.
SAS does a form of disk buffer caching -- called block buffers --, but minimizes the amount of memory it needs. All data is read in, processed, and written out, from one record (observation) at a time to blocks of observations at a time. I have run through a multi-gigabyte file with > 600M records on a p630 w/1.45 GHz processors in about 30 minutes, and consumed < 200 MB of memory in the process. But, I needed 100 GB of temporary work space, just for myself. Before I left, we had just upgraded to providing 100 GB of disk space for WORK, 100 GB of space for SORTDEV, and 100 to 200 GB of space for permanent SAS datasets for each SAS users group (about 7 groups). This totaled > 2.1 TB of SAN storage space (Hitachi 99?0). The next move was going to be providing each group with their own pair of HBA's because the box, and SAS, was more IO bound than CPU constrained.
So, if the box is not SAN connected to external storage, I strongly recommend that that be changed, and that the Unix team not try to conserve disk space through zipping.
Chuck's right; it's a Unix shell script. I'm not sure of which flavor of Unix you have, but I do know that Solaris 9 does not support data compression through the server (Solaris 10 does; it may be worth the migration for that one feature). When a file is gzip'd, there is typically an 80% savings in disk space on a SAS dataset, so there is significant value to the Unix admin to having the files as .gz (in particular, for backup times).
It's a bit of a pain for EG users storing the data on Unix. A kluge work-around would be to have a personal Unix scheduled task (cron) to sweep through the data directories and do a daily gunzip of the files (gunzip changes the non-root owner of the files, so the user needs write and delete access to the directory).
To find the script, the Unix admin's are probably going to have to go through the root user's crontab.