BookmarkSubscribeRSS Feed
Aman4SAS
Obsidian | Level 7

Hi ,

I need to read .gz files

my code is

This code is running fine

FILENAME in Pipe "I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz";

libname anuj "I:\anuj\";

DATA anuj.test;

  INFILE in ;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

problem starts when there is gap in my path like:

FILENAME in Pipe "I:\anuj\temp folder\gzip -dc I:\Anuj\temp folder\rawdata.txt.gz";

libname anuj "I:\anuj\";

DATA anuj.test;

  INFILE in ;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

in this case i m getting error.

Plz help

10 REPLIES 10
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, for DOS commands you need to have double quotes around a path which contains spaces, special chars etc.  So try:

FILENAME in Pipe 'I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

Note the whole DOS part is in single quotes, with double quotes around the path to .gx file.

Aman4SAS
Obsidian | Level 7

Still getting error: below are code and log for both scenario:

code1:

FILENAME in Pipe "I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz" LRECL=80;

libname anuj "I:\anuj\";

DATA anuj.test;

  INFILE in missover;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

LOG1

81   FILENAME in Pipe "I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz" LRECL=80;

82   libname anuj "I:\anuj\";

NOTE: Libname ANUJ refers to the same physical library as AJ.

NOTE: Libref ANUJ was successfully assigned as follows:

      Engine:        V9

      Physical Name: I:\anuj

83   DATA anuj.test;

84     INFILE in missover;

85     INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

86   RUN;

NOTE: The infile IN is:

      Unnamed Pipe Access Device,

      PROCESS=I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz,

      RECFM=V,LRECL=80

NOTE: 5 records were read from the infile IN.

      The minimum record length was 27.

      The maximum record length was 27.

      One or more lines were truncated.

NOTE: The data set ANUJ.TEST has 5 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.09 seconds

      cpu time            0.00 seconds

CODE 2

FILENAME in Pipe 'I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

DATA anuj.test;

  INFILE in missover;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

LOG2:

87   FILENAME in Pipe 'I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

88   DATA anuj.test;

89     INFILE in missover;

90     INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

91   RUN;

NOTE: The infile IN is:

      Unnamed Pipe Access Device,

      PROCESS=I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz",

      RECFM=V,LRECL=256

Stderr output:

'I:\anuj\temp' is not recognized as an internal or external command,

operable program or batch file.

NOTE: 0 records were read from the infile IN.

NOTE: The data set ANUJ.TEST has 0 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.07 seconds

      cpu time            0.03 seconds

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You would need something like:

FILENAME in Pipe '"I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

NOte the single quotes around whole text, then double quotes around each path/file .

Aman4SAS
Obsidian | Level 7

still not working same error

FILENAME in Pipe '"I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

DATA anuj.test;

  INFILE in missover;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

Log:

102  FILENAME in Pipe '"I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

103  DATA anuj.test;

104    INFILE in missover;

105    INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

106  RUN;

NOTE: The infile IN is:

      Unnamed Pipe Access Device,

      PROCESS="I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz",

      RECFM=V,LRECL=256

Stderr output:

'I:\anuj\temp' is not recognized as an internal or external command,

operable program or batch file.

NOTE: 0 records were read from the infile IN.

NOTE: The data set ANUJ.TEST has 0 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.04 seconds

      cpu time            0.00 seconds

Kurt_Bremser
Super User

You do not need to store the gzip.exe anywhere you have data, one instance is sufficent. So you can always use I:\anuj\gzip

And avoid blanks in file/directory names, they are just a completely unnecessary nuisance. Use underlines instead.

Aman4SAS
Obsidian | Level 7

Thanks for your input.

My concern is raw file can be store anywhere, if i store gzip file one location its ok , but still having error if raw file's path have space in between.

is there any solution for it???

Kurt_Bremser
Super User

In your example, the complaint is about the name of the executable, so I advise to fix that by calling it from a "non-spaced" location.

I just tested gzip -dc on AIX and was able to use the syntax

filename in pipe 'gzip -dc "$HOME/test folder/testfile.gz"';

without error. Maybe windows throws a log between your legs.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

So does it work when you open Command Prompt ->

That is an interface to DOS, and put in: "I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"

I don't use gzip myself, but:

dir "c:\program files"

works fine, whereas not having the double quotes wont.

Get the DOS command working under DOS first, then copy it into your SAS program and put single quotes around it.

One question, in your example, where is the output directory?  I can only see a specification of where the Executable is, and then where the .gz is.  Maybe you need an output path as well.

Also, one question, why do this manually?  Personally if I am receiving data I would have a process for getting the file, unpacking it, validating it etc. before even going near SAS?

Aman4SAS
Obsidian | Level 7

Hi RW9,

Thanks for your kind suggestions.

Apart all i resolved my issue by storing gzip in location where no space in paths and now no matter if space is in path of raw files, its running fine.

ans of ur que one: i didnt find any need for defining output path while i need data on sas plateform[work lib for for other tasks]

Que 2: i getting near about 300+ .gz files every month. in this case mannual work would be a additional pain 🙂

Thanks 2 all for great inputs

Kurt_Bremser
Super User

Just an aside:

gzip -c streams the output to stdout, so "gzip -dc filename" in a filename pipe is the perfect way to stream the decompressed output into the data step. Saves any additional disk space

gzip is one of those "don't leave home without it" open source utilities.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2946 views
  • 0 likes
  • 3 in conversation