BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dorota_Jarosz
Obsidian | Level 7

Does anybody know how to read compressed files (e.g. .GZIP) using PC SAS 9.2 TS level 2M3 on Windows 7? Of course without uncompressing... It used to work on Win XP.  Let's start with raw data files (text).

1 ACCEPTED SOLUTION

Accepted Solutions
FriedEgg
SAS Employee

It seems like an installation issue with your gzip under windows 7 then...

http://gnuwin32.sourceforge.net/packages/gzip.htm

You may also try using a different utility such as 7z, which can also work with gzip files through command line and with pipes.

http://www.7-zip.org/

View solution in original post

17 REPLIES 17
art297
Opal | Level 21

You may be able to uncompress, on the fly, by piping the file in your filename statement.  e.g., take a look at: http://www.ats.ucla.edu/stat/sas/faq/readgz.htm

Dorota_Jarosz
Obsidian | Level 7

We used the same piping syntax in filename stmt for five years through thousands of programs and through two changes in operating systems.  It stopped working after changing to Windows 7.

FriedEgg
SAS Employee

It seems like an installation issue with your gzip under windows 7 then...

http://gnuwin32.sourceforge.net/packages/gzip.htm

You may also try using a different utility such as 7z, which can also work with gzip files through command line and with pipes.

http://www.7-zip.org/

Dorota_Jarosz
Obsidian | Level 7

Hm, the data comes as GZIP, it will be very cumbersome to convert all the files to a different utility and ask for the input in a different format. 

Tom
Super User Tom
Super User

Not a different format.  Just a different utility program on you PC to read the same format.

Doc_Duke
Rhodochrosite | Level 12

You haven't told us your complete environment, so these comments may not be applicable.

On Win XP (at least), piping the GZIP command produces a temporary (hidden) copy of the uncompressed data, which takes up a lot of space and disk I/O (this does not happen on *nix systems).

Windows compression gives nearly as good a compression on raw data files as GZIP.

The two of these together means that using Windows compression (and skipping the pipe) is more efficient in Windows.

------------

Caveat:  I discovered the first one the hard way several years ago and can't currently remember which compression utility I was using.  However, my gut feeling is that it is a Window's "feature" rather than part of gzip.

Message was edited by: Lawrence Muhlbaier

Dorota_Jarosz
Obsidian | Level 7

Thanks, Doc@Duke:  I may try  Windows compression directly as a separate test project, but many of my input files reside on a non-Windows server. I have to copy them to my local drive.

Dorota_Jarosz
Obsidian | Level 7

SAS Tech support claims it is not SAS's fault. It must be the environment or the GZIP issue.

Here is the basic code.

%let File1="e:\John\ProductFiles\Product.out.20120115.gz";

%let dir=C:\"Program Files"\GnuWin32\bin\gzip;

data Product (keep=var1 var2 var3 var4);

filename file2 pipe %unquote(%str(%'&dir -cd &file1%'));

infile file2 DSD missover;

length var1 $12 var2 $14 var3 $3 var4 $20;

input var1 var2 var3 var4;

run;

FriedEgg
SAS Employee

Try setting up your command prompt enviornment first by doing the following:

x set path=%nrstr(C:\Program Files\GnuWin32\bin;%PATH%);

*you may want to set the following to an alternate path as they are where the temporary files are written;

*by default this location is in your homedirectory, I beleive;

x set tmp=%TEMP%;

x set tmpdir=%TEMP%;

%let file1=E:\John\ProductFiles\Product.out.20120115.gz;

filename file2 pipe "gzip -cd &file1";

If you still experience issues in SAS try performing the same commands in windows command prompt and see if you experience similar issues or get an error message of some kind.

Tom
Super User Tom
Super User

I doubt it is the issue, but tou should move your FILENAME statement to BEFORE the data step that is trying to use it.  It is a global statement and not one that executes inside the data step.

You can simplify your FILENAME statement by using the QUOTE function rather than the macro quoting functions. 

QUOTE will surround the string in dquote (") characters and double up any dquote (") characters in your generated command so that they are passed properly to the operating system.

filename file2 pipe %sysfunc(quote(&dir -cd &file1));

Try moving the quotes in your DIR (atually command)nacro  variable to the front and back instead of just around Program Files part of the path.

Open a command window on your PC and run the same command to see what it does. Try piping it to more so you can scroll through the file.

> "c:\Program Files\Gnu\Win32\bin\gzip" -cd "e:\John\ProductFiles\Product.out.20120115.gz" | more

Dorota_Jarosz
Obsidian | Level 7

Thank you for all your input. I really appreciate all of your answers.  Since the problem occurred not on one of SAS PCs at my location, I decided to start from scratch by loading GZIP (thanks art297 and FriedEgg!) on my Win 7 machine with the same SAS installation, but no prior history of GZIP.  Much to my surprise I succeeded to read a comma delimited file compressed by GZIP with no problems. So we will have to check the other PC regarding both the SAS and GZIP install.

Comments to some hints fromTom:

1) Using double quotes around the %dir path and keeping embedded space in Program Files yielded an error.

NOTE: The infile FILE2 is:

Unnamed Pipe Access Device,

PROCESS="C:\Program Files\GnuWin32\bin\gzip" -cd "c:\curr_avail.txt.gz",

RECFM=V,LRECL=256

Stderr output:

'C:\Program' is not recognized as an internal or external command,

operable program or batch file.

2) Moving the filename statement to above the data step did not make any difference, though I agree it is a global statement and should be moved up.

3) Both %unquote and  %sysfunc yielded correct results.

Since I cannot replicate the general failure we'll have to test many scenarios, as FriedEgg, Doc@Duc, and Tom suggested. Unfortunately the PC in question is at a remote site for me, so it may take longer than my own fixes.

Dorota_Jarosz
Obsidian | Level 7

My apology! It turned out the PC needed a reboot after the installation of GZIP (duh!), which I did perform, but the other user did not, since the GZIP installation did not prompt for it.

Well, at least we now know that it works.

Now I have a follow up question: is it possible to read in SAS a GZIP compressed FOLDER with several text comma delimited files? If so, what would be the syntax?

Actually, when I decompress one of my standard GZ files I get two folders: one with just one file, and another one with multiple files.

FriedEgg
SAS Employee

GZIP files cannot contain folder structures.  This must be a different type of archived file.

Dorota_Jarosz
Obsidian | Level 7

The filename syntax is filename.tar.gz  - I assume it is GZIP. I gather the file inside is the result of applying tar archive to bundle several files, while preserving the file system. When I decompress it, I get two folders.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 3991 views
  • 6 likes
  • 6 in conversation