Optimize IT resource capacity and performance with SAS

Issue in reading .gz files

Reply
Super Contributor
Posts: 258

Issue in reading .gz files

Hi ,

I need to read .gz files

my code is

This code is running fine

FILENAME in Pipe "I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz";

libname anuj "I:\anuj\";

DATA anuj.test;

  INFILE in ;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

problem starts when there is gap in my path like:

FILENAME in Pipe "I:\anuj\temp folder\gzip -dc I:\Anuj\temp folder\rawdata.txt.gz";

libname anuj "I:\anuj\";

DATA anuj.test;

  INFILE in ;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

in this case i m getting error.

Plz help

Esteemed Advisor
Esteemed Advisor
Posts: 6,678

Re: Issue in reading .gz files

Yes, for DOS commands you need to have double quotes around a path which contains spaces, special chars etc.  So try:

FILENAME in Pipe 'I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

Note the whole DOS part is in single quotes, with double quotes around the path to .gx file.

Super Contributor
Posts: 258

Re: Issue in reading .gz files

Still getting error: below are code and log for both scenario:

code1:

FILENAME in Pipe "I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz" LRECL=80;

libname anuj "I:\anuj\";

DATA anuj.test;

  INFILE in missover;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

LOG1

81   FILENAME in Pipe "I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz" LRECL=80;

82   libname anuj "I:\anuj\";

NOTE: Libname ANUJ refers to the same physical library as AJ.

NOTE: Libref ANUJ was successfully assigned as follows:

      Engine:        V9

      Physical Name: I:\anuj

83   DATA anuj.test;

84     INFILE in missover;

85     INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

86   RUN;

NOTE: The infile IN is:

      Unnamed Pipe Access Device,

      PROCESS=I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz,

      RECFM=V,LRECL=80

NOTE: 5 records were read from the infile IN.

      The minimum record length was 27.

      The maximum record length was 27.

      One or more lines were truncated.

NOTE: The data set ANUJ.TEST has 5 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.09 seconds

      cpu time            0.00 seconds

CODE 2

FILENAME in Pipe 'I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

DATA anuj.test;

  INFILE in missover;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

LOG2:

87   FILENAME in Pipe 'I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

88   DATA anuj.test;

89     INFILE in missover;

90     INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

91   RUN;

NOTE: The infile IN is:

      Unnamed Pipe Access Device,

      PROCESS=I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz",

      RECFM=V,LRECL=256

Stderr output:

'I:\anuj\temp' is not recognized as an internal or external command,

operable program or batch file.

NOTE: 0 records were read from the infile IN.

NOTE: The data set ANUJ.TEST has 0 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.07 seconds

      cpu time            0.03 seconds

Esteemed Advisor
Esteemed Advisor
Posts: 6,678

Re: Issue in reading .gz files

You would need something like:

FILENAME in Pipe '"I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

NOte the single quotes around whole text, then double quotes around each path/file .

Super Contributor
Posts: 258

Re: Issue in reading .gz files

still not working same error

FILENAME in Pipe '"I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

DATA anuj.test;

  INFILE in missover;

  INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

RUN;

Log:

102  FILENAME in Pipe '"I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"';

103  DATA anuj.test;

104    INFILE in missover;

105    INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;

106  RUN;

NOTE: The infile IN is:

      Unnamed Pipe Access Device,

      PROCESS="I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz",

      RECFM=V,LRECL=256

Stderr output:

'I:\anuj\temp' is not recognized as an internal or external command,

operable program or batch file.

NOTE: 0 records were read from the infile IN.

NOTE: The data set ANUJ.TEST has 0 observations and 4 variables.

NOTE: DATA statement used (Total process time):

      real time           0.04 seconds

      cpu time            0.00 seconds

Esteemed Advisor
Posts: 5,928

Re: Issue in reading .gz files

You do not need to store the gzip.exe anywhere you have data, one instance is sufficent. So you can always use I:\anuj\gzip

And avoid blanks in file/directory names, they are just a completely unnecessary nuisance. Use underlines instead.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 258

Re: Issue in reading .gz files

Thanks for your input.

My concern is raw file can be store anywhere, if i store gzip file one location its ok , but still having error if raw file's path have space in between.

is there any solution for it???

Esteemed Advisor
Posts: 5,928

Re: Issue in reading .gz files

In your example, the complaint is about the name of the executable, so I advise to fix that by calling it from a "non-spaced" location.

I just tested gzip -dc on AIX and was able to use the syntax

filename in pipe 'gzip -dc "$HOME/test folder/testfile.gz"';

without error. Maybe windows throws a log between your legs.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Esteemed Advisor
Esteemed Advisor
Posts: 6,678

Re: Issue in reading .gz files

So does it work when you open Command Prompt ->

That is an interface to DOS, and put in: "I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"

I don't use gzip myself, but:

dir "c:\program files"

works fine, whereas not having the double quotes wont.

Get the DOS command working under DOS first, then copy it into your SAS program and put single quotes around it.

One question, in your example, where is the output directory?  I can only see a specification of where the Executable is, and then where the .gz is.  Maybe you need an output path as well.

Also, one question, why do this manually?  Personally if I am receiving data I would have a process for getting the file, unpacking it, validating it etc. before even going near SAS?

Super Contributor
Posts: 258

Re: Issue in reading .gz files

Hi RW9,

Thanks for your kind suggestions.

Apart all i resolved my issue by storing gzip in location where no space in paths and now no matter if space is in path of raw files, its running fine.

ans of ur que one: i didnt find any need for defining output path while i need data on sas plateform[work lib for for other tasks]

Que 2: i getting near about 300+ .gz files every month. in this case mannual work would be a additional pain :-)

Thanks 2 all for great inputs

Esteemed Advisor
Posts: 5,928

Re: Issue in reading .gz files

Just an aside:

gzip -c streams the output to stdout, so "gzip -dc filename" in a filename pipe is the perfect way to stream the decompressed output into the data step. Saves any additional disk space

gzip is one of those "don't leave home without it" open source utilities.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Post a Question
Discussion Stats
  • 10 replies
  • 920 views
  • 0 likes
  • 3 in conversation