Hi ,
I need to read .gz files
my code is
This code is running fine
FILENAME in Pipe "I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz";
libname anuj "I:\anuj\";
DATA anuj.test;
INFILE in ;
INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;
RUN;
problem starts when there is gap in my path like:
FILENAME in Pipe "I:\anuj\temp folder\gzip -dc I:\Anuj\temp folder\rawdata.txt.gz";
libname anuj "I:\anuj\";
DATA anuj.test;
INFILE in ;
INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;
RUN;
in this case i m getting error.
Plz help
Yes, for DOS commands you need to have double quotes around a path which contains spaces, special chars etc. So try:
FILENAME in Pipe 'I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz"';
Note the whole DOS part is in single quotes, with double quotes around the path to .gx file.
Still getting error: below are code and log for both scenario:
code1:
FILENAME in Pipe "I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz" LRECL=80;
libname anuj "I:\anuj\";
DATA anuj.test;
INFILE in missover;
INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;
RUN;
LOG1
81 FILENAME in Pipe "I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz" LRECL=80;
82 libname anuj "I:\anuj\";
NOTE: Libname ANUJ refers to the same physical library as AJ.
NOTE: Libref ANUJ was successfully assigned as follows:
Engine: V9
Physical Name: I:\anuj
83 DATA anuj.test;
84 INFILE in missover;
85 INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;
86 RUN;
NOTE: The infile IN is:
Unnamed Pipe Access Device,
PROCESS=I:\anuj\gzip -dc I:\Anuj\rawdata.txt.gz,
RECFM=V,LRECL=80
NOTE: 5 records were read from the infile IN.
The minimum record length was 27.
The maximum record length was 27.
One or more lines were truncated.
NOTE: The data set ANUJ.TEST has 5 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.09 seconds
cpu time 0.00 seconds
CODE 2
FILENAME in Pipe 'I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz"';
DATA anuj.test;
INFILE in missover;
INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;
RUN;
LOG2:
87 FILENAME in Pipe 'I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz"';
88 DATA anuj.test;
89 INFILE in missover;
90 INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;
91 RUN;
NOTE: The infile IN is:
Unnamed Pipe Access Device,
PROCESS=I:\anuj\temp folder\gzip -dc "I:\Anuj\temp folder\rawdata.txt.gz",
RECFM=V,LRECL=256
Stderr output:
'I:\anuj\temp' is not recognized as an internal or external command,
operable program or batch file.
NOTE: 0 records were read from the infile IN.
NOTE: The data set ANUJ.TEST has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.03 seconds
You would need something like:
FILENAME in Pipe '"I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"';
NOte the single quotes around whole text, then double quotes around each path/file .
still not working same error
FILENAME in Pipe '"I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"';
DATA anuj.test;
INFILE in missover;
INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;
RUN;
Log:
102 FILENAME in Pipe '"I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"';
103 DATA anuj.test;
104 INFILE in missover;
105 INPUT make $ 1-14 mpg 15-18 weight 19-23 price 24-27 ;
106 RUN;
NOTE: The infile IN is:
Unnamed Pipe Access Device,
PROCESS="I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz",
RECFM=V,LRECL=256
Stderr output:
'I:\anuj\temp' is not recognized as an internal or external command,
operable program or batch file.
NOTE: 0 records were read from the infile IN.
NOTE: The data set ANUJ.TEST has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.00 seconds
You do not need to store the gzip.exe anywhere you have data, one instance is sufficent. So you can always use I:\anuj\gzip
And avoid blanks in file/directory names, they are just a completely unnecessary nuisance. Use underlines instead.
Thanks for your input.
My concern is raw file can be store anywhere, if i store gzip file one location its ok , but still having error if raw file's path have space in between.
is there any solution for it???
In your example, the complaint is about the name of the executable, so I advise to fix that by calling it from a "non-spaced" location.
I just tested gzip -dc on AIX and was able to use the syntax
filename in pipe 'gzip -dc "$HOME/test folder/testfile.gz"';
without error. Maybe windows throws a log between your legs.
So does it work when you open Command Prompt ->
That is an interface to DOS, and put in: "I:\anuj\temp folder\gzip.exe" -dc "I:\Anuj\temp folder\rawdata.txt.gz"
I don't use gzip myself, but:
dir "c:\program files"
works fine, whereas not having the double quotes wont.
Get the DOS command working under DOS first, then copy it into your SAS program and put single quotes around it.
One question, in your example, where is the output directory? I can only see a specification of where the Executable is, and then where the .gz is. Maybe you need an output path as well.
Also, one question, why do this manually? Personally if I am receiving data I would have a process for getting the file, unpacking it, validating it etc. before even going near SAS?
Hi RW9,
Thanks for your kind suggestions.
Apart all i resolved my issue by storing gzip in location where no space in paths and now no matter if space is in path of raw files, its running fine.
ans of ur que one: i didnt find any need for defining output path while i need data on sas plateform[work lib for for other tasks]
Que 2: i getting near about 300+ .gz files every month. in this case mannual work would be a additional pain 🙂
Thanks 2 all for great inputs
Just an aside:
gzip -c streams the output to stdout, so "gzip -dc filename" in a filename pipe is the perfect way to stream the decompressed output into the data step. Saves any additional disk space
gzip is one of those "don't leave home without it" open source utilities.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.