Hi All,
I managed to import zipped CSV files using the following code:
FILENAME ZIPFILE SASZIPAM "Z:\Data\fname001.zip";
data work.imported. (COMPRESS = YES);
infile ZIPFILE(fname001.csv)
delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2;
informat a b c ;
format a b c;
input a b c;
run;
However, sometimes when I am importing large CSV files (~4.5m rows, >8GB in without zipping), I encountered I/O error.
After trial and error, I noticed that I can successfully import using the above code by reducing the observation read in by 1 row, using the OBS= option after FIRSTOBS=2;
This brought me two questions:
Thanks.
Have you tried this method from @ChrisHemedinger of extracting the zipped file first https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-the-data-from-zipped-csv-file/td-p/284... and then running Proc Import?
You can't know the number of rows (=the number of line delimiters) of a text file before you've fully scanned it. The number of rows is not a metadata attribute of such a file.
Before the introduction of the ZIP engine I've always used 3rd party tools (preferrably 7-zip; calling it via a SAS pipe) when dealing with zip files in SAS. The SASZIPAM engine has some limitations and I've always managed to avoid dealing with it.
The SASZIPAM engine is not something that's fully documented/supported for customer use. The FILENAME ZIP method is pretty good, supports ZIP files (and GZIP very soon now - in SAS 9.4 Maint 5) and member-level access. But behind the scenes, the file will need to be extracted from the ZIP in order for SAS to know anything about it, including record count -- and that's going to require whatever amount of space is needed to store the file.
FILENAME ZIP is available only in SAS 9.4 and later though -- not in SAS 9.2. You can play with it in the SAS University Edition to learn more, though that won't help in your production environment.
Ok your post says this happens sometimes - could it be that you're running out of space - I suspect that reading zip files must involve the creation of some utility files behind the scenes and if you try with a fresh SAS session it might work?
yes, sometimes only. And also those larger CSV files
I suspect that was caused by some dodgy footer or something of that sort. If I can get the number of row in advance, I can program it to avoid the last row.
But seems like this is not possible without an attempt to import it (and fail) and then find out in the log.
I would assume if you can run something like below to get the count of lines then the file is structurally o.k.
data _null_;
infile .... end=last;
input;
count+1;
if last then
call symput('count',count);
run;
%put &=count;
If above fails then yes, something more serious is not right.
So @Patrick's way works with a smaller file (17k rows) but getting an I/O error when working with a 4.5m row file.
ERROR: Invalid compressed data, length does not match.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program.
Aborted during the EXECUTION phase.
Hmmm... I guess that's something to do with the buffer?
Not sure.
I could replicate the error in my environment (local SAS9.4 Win7 instance) using below code.
filename testzip zip 'c:\test\testfile.zip';
data _null_;
file testzip(testfile.txt);
array vars {5} $100 (5*' adslkfasdlfkas;dfkjd daflk;jadsfkdjdk jdlkaj;dfadk;;ldsaf |');
do i=1 to 45*10**6;
put i '|' vars[*];
end;
run;
filename testzip clear;
filename zipfile saszipam "c:\test\testfile.zip";
data _null_;
infile zipfile(testfile.txt) end=last;
input;
count+1;
if last then
call symput('count',put(count,best32.));
run;
filename zipfile clear;
%put &=count;
NOTE: The infile ZIPFILE(testfile.txt) is: File Name=testfile.txt, Compressed Size=148520385, Uncompressed Size=4294967295, Compression Level=-1,Clear Text=Yes ERROR: Invalid compressed data, length does not match. FATAL: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the EXECUTION phase. NOTE: A total of 45000000 records were read from the infile library ZIPFILE. NOTE: 45000000 records were read from the infile ZIPFILE(testfile.txt). One or more lines were truncated. NOTE: The SAS System stopped processing this step because of errors.
As you I'm only running into this issue once the zipped file exceeds a certain size.
I had no problems reading the same file using the SAS9.4 ZIP engine with code as below.
filename testzip zip 'c:\test\testfile.zip';
data _null_;
infile testzip(testfile.txt) end=last;
input;
count+1;
if last then
call symput('count',put(count,best32.));
run;
filename testzip clear;
%put &=count;
Given that the SASZIPAM engine is not meant for public use I guess you will either have to use a 3rd party tool like 7-zip or you need to upgrade to a more recent version of SAS.
7-zip comes also in a portable version (also the command line bit) so I guess you could make this work in your environment even if you don't have the permissions to install software.
Sounds like you have hit the limit of the ability of that filename egine and your work around is just going to ignore the end of the file.
You either need to update your SAS version (why are you still using a program that is over 7 years old when newer versions are available for no extra money?) or switch to using ZIP ro 7z or other utility that can handle those files.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.