BookmarkSubscribeRSS Feed
J_CKY
Obsidian | Level 7

Hi All,

 

I managed to import zipped CSV files using the following code:

 

FILENAME ZIPFILE SASZIPAM "Z:\Data\fname001.zip";
data work.imported. (COMPRESS = YES);
infile ZIPFILE(fname001.csv)
       delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2;
informat a b c ;
format a b c;
input a b c;
run;
 

However, sometimes when I am importing large CSV files (~4.5m rows, >8GB in without zipping), I encountered I/O error.

 

 

After trial and error, I noticed that I can successfully import using the above code by reducing the observation read in by 1 row, using the OBS= option after FIRSTOBS=2;

 

This brought me two questions:

  1. Why was there an I/O error only when the last row included?
  2. How can I know the total number of rows in the CSV file so that I can skip the last row when building an import macro?

Thanks.

11 REPLIES 11
J_CKY
Obsidian | Level 7
Won't work. v9.2 do not have FILENAME ZIP method.
Patrick
Opal | Level 21

@J_CKY

You can't know the number of rows (=the number of line delimiters) of a text file before you've fully scanned it. The number of rows is not a metadata attribute of such a file.

 

Before the introduction of the ZIP engine I've always used 3rd party tools (preferrably 7-zip; calling it via a SAS pipe) when dealing with zip files in SAS. The SASZIPAM engine has some limitations and I've always managed to avoid dealing with it.

J_CKY
Obsidian | Level 7
Thanks for your input.
Unfortunately, I do not have access to 7-zip at work and red-tapes here prevents me from getting it.

I would have used 7zip method if I have the choice. 😞
ChrisHemedinger
Community Manager

The SASZIPAM engine is not something that's fully documented/supported for customer use.  The FILENAME ZIP method is pretty good, supports ZIP files (and GZIP very soon now - in SAS 9.4 Maint 5) and member-level access.  But behind the scenes, the file will need to be extracted from the ZIP in order for SAS to know anything about it, including record count -- and that's going to require whatever amount of space is needed to store the file.

 

FILENAME ZIP is available only in SAS 9.4 and later though -- not in SAS 9.2.  You can play with it in the SAS University Edition to learn more, though that won't help in your production environment.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
ChrisBrooks
Ammonite | Level 13

Ok your post says this happens sometimes - could it be that you're running out of space - I suspect that reading zip files must involve the creation of some utility files behind the scenes and if you try with a fresh SAS session it might work?

J_CKY
Obsidian | Level 7

yes, sometimes only.  And also those larger CSV files

 

I suspect that was caused by some dodgy footer or something of that sort.  If I can get the number of row in advance, I can program it to avoid the last row.

 

But seems like this is not possible without an attempt to import it (and fail) and then find out in the log.

Patrick
Opal | Level 21

@J_CKY

I would assume if you can run something like below to get the count of lines then the file is structurally o.k.

data _null_;
  infile .... end=last;
  input;
  count+1;
  if last then
    call symput('count',count);
run;
%put &=count;

If above fails then yes, something more serious is not right.

J_CKY
Obsidian | Level 7

So @Patrick's way works with a smaller file (17k rows) but getting an I/O error when working with a 4.5m row file.

 

ERROR: Invalid compressed data, length does not match.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program.
       Aborted during the EXECUTION phase.

Hmmm... I guess that's something to do with the buffer?

 

 

Not sure.

Patrick
Opal | Level 21

@J_CKY

I could replicate the error in my environment (local SAS9.4 Win7 instance) using below code.

filename testzip zip 'c:\test\testfile.zip';
data _null_;
  file testzip(testfile.txt);
  array vars {5} $100 (5*'  adslkfasdlfkas;dfkjd daflk;jadsfkdjdk jdlkaj;dfadk;;ldsaf |');
  do i=1 to 45*10**6;
    put i '|' vars[*];
  end;
run;
filename testzip clear;


filename zipfile saszipam "c:\test\testfile.zip";
data _null_;
  infile zipfile(testfile.txt) end=last;
  input;
  count+1;
  if last then
    call symput('count',put(count,best32.));
run;
filename zipfile clear;
%put &=count;
NOTE: The infile ZIPFILE(testfile.txt) is:
      File Name=testfile.txt,
      Compressed Size=148520385,
      Uncompressed Size=4294967295,
      Compression Level=-1,Clear Text=Yes

ERROR: Invalid compressed data, length does not match.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program.  Aborted during the EXECUTION phase.
NOTE: A total of 45000000 records were read from the infile library ZIPFILE.
NOTE: 45000000 records were read from the infile ZIPFILE(testfile.txt).
      One or more lines were truncated.
NOTE: The SAS System stopped processing this step because of errors.

As you I'm only running into this issue once the zipped file exceeds a certain size.

 

 

I had no problems reading the same file using the SAS9.4 ZIP engine with code as below.

filename testzip zip 'c:\test\testfile.zip';
data _null_;
  infile testzip(testfile.txt) end=last;
  input;
  count+1;
  if last then
    call symput('count',put(count,best32.));
run;
filename testzip clear;
%put &=count;

Given that the SASZIPAM engine is not meant for public use I guess you will either have to use a 3rd party tool like 7-zip or you need to upgrade to a more recent version of SAS.

 

7-zip comes also in a portable version (also the command line bit) so I guess you could make this work in your environment even if you don't have the permissions to install software.

Tom
Super User Tom
Super User

Sounds like you have hit the limit of the ability of that filename egine and your work around is just going to ignore the end of the file.

You either need to update your SAS version (why are you still using a program that is over 7 years old when newer versions are available for no extra money?) or switch to using ZIP ro 7z or other utility that can handle those files.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2212 views
  • 3 likes
  • 5 in conversation