01-27-2012 05:10 PM
01-30-2012 07:12 PM
01-27-2012 05:20 PM
You may be able to uncompress, on the fly, by piping the file in your filename statement. e.g., take a look at: http://www.ats.ucla.edu/stat/sas/faq/readgz.htm
01-27-2012 06:09 PM
We used the same piping syntax in filename stmt for five years through thousands of programs and through two changes in operating systems. It stopped working after changing to Windows 7.
01-30-2012 07:12 PM
01-31-2012 11:26 AM
Hm, the data comes as GZIP, it will be very cumbersome to convert all the files to a different utility and ask for the input in a different format.
01-31-2012 10:13 AM
You haven't told us your complete environment, so these comments may not be applicable.
On Win XP (at least), piping the GZIP command produces a temporary (hidden) copy of the uncompressed data, which takes up a lot of space and disk I/O (this does not happen on *nix systems).
Windows compression gives nearly as good a compression on raw data files as GZIP.
The two of these together means that using Windows compression (and skipping the pipe) is more efficient in Windows.
Caveat: I discovered the first one the hard way several years ago and can't currently remember which compression utility I was using. However, my gut feeling is that it is a Window's "feature" rather than part of gzip.
Message was edited by: Lawrence Muhlbaier
01-31-2012 10:25 PM
Thanks, Doc@Duke: I may try Windows compression directly as a separate test project, but many of my input files reside on a non-Windows server. I have to copy them to my local drive.
01-31-2012 11:19 AM
SAS Tech support claims it is not SAS's fault. It must be the environment or the GZIP issue.
Here is the basic code.
%let dir=C:\"Program Files"\GnuWin32\bin\gzip;
data Product (keep=var1 var2 var3 var4);
filename file2 pipe %unquote(%str(%'&dir -cd &file1%'));
infile file2 DSD missover;
length var1 $12 var2 $14 var3 $3 var4 $20;
input var1 var2 var3 var4;
01-31-2012 03:50 PM
Try setting up your command prompt enviornment first by doing the following:
x set path=%nrstr(C:\Program Files\GnuWin32\bin;%PATH%);
*you may want to set the following to an alternate path as they are where the temporary files are written;
*by default this location is in your homedirectory, I beleive;
x set tmp=%TEMP%;
x set tmpdir=%TEMP%;
filename file2 pipe "gzip -cd &file1";
If you still experience issues in SAS try performing the same commands in windows command prompt and see if you experience similar issues or get an error message of some kind.
01-31-2012 04:17 PM
I doubt it is the issue, but tou should move your FILENAME statement to BEFORE the data step that is trying to use it. It is a global statement and not one that executes inside the data step.
You can simplify your FILENAME statement by using the QUOTE function rather than the macro quoting functions.
QUOTE will surround the string in dquote (") characters and double up any dquote (") characters in your generated command so that they are passed properly to the operating system.
filename file2 pipe %sysfunc(quote(&dir -cd &file1));
Try moving the quotes in your DIR (atually command)nacro variable to the front and back instead of just around Program Files part of the path.
Open a command window on your PC and run the same command to see what it does. Try piping it to more so you can scroll through the file.
> "c:\Program Files\Gnu\Win32\bin\gzip" -cd "e:\John\ProductFiles\Product.out.20120115.gz" | more
01-31-2012 10:18 PM
Thank you for all your input. I really appreciate all of your answers. Since the problem occurred not on one of SAS PCs at my location, I decided to start from scratch by loading GZIP (thanks art297 and FriedEgg!) on my Win 7 machine with the same SAS installation, but no prior history of GZIP. Much to my surprise I succeeded to read a comma delimited file compressed by GZIP with no problems. So we will have to check the other PC regarding both the SAS and GZIP install.
Comments to some hints fromTom:
1) Using double quotes around the %dir path and keeping embedded space in Program Files yielded an error.
NOTE: The infile FILE2 is:
Unnamed Pipe Access Device,
PROCESS="C:\Program Files\GnuWin32\bin\gzip" -cd "c:\curr_avail.txt.gz",
'C:\Program' is not recognized as an internal or external command,
operable program or batch file.
2) Moving the filename statement to above the data step did not make any difference, though I agree it is a global statement and should be moved up.
3) Both %unquote and %sysfunc yielded correct results.
Since I cannot replicate the general failure we'll have to test many scenarios, as FriedEgg, Doc@Duc, and Tom suggested. Unfortunately the PC in question is at a remote site for me, so it may take longer than my own fixes.
02-01-2012 04:18 PM
My apology! It turned out the PC needed a reboot after the installation of GZIP (duh!), which I did perform, but the other user did not, since the GZIP installation did not prompt for it.
Well, at least we now know that it works.
Now I have a follow up question: is it possible to read in SAS a GZIP compressed FOLDER with several text comma delimited files? If so, what would be the syntax?
Actually, when I decompress one of my standard GZ files I get two folders: one with just one file, and another one with multiple files.
02-01-2012 04:46 PM
The filename syntax is filename.tar.gz - I assume it is GZIP. I gather the file inside is the result of applying tar archive to bundle several files, while preserving the file system. When I decompress it, I get two folders.
Need further help from the community? Please ask a new question.