DATA Step, Macro, Functions and more

How to get number of rows in zipped CSV before importing in 9.2

Reply
Contributor
Posts: 22

How to get number of rows in zipped CSV before importing in 9.2

[ Edited ]

Hi All,

 

I managed to import zipped CSV files using the following code:

 

FILENAME ZIPFILE SASZIPAM "Z:\Data\fname001.zip";
data work.imported. (COMPRESS = YES);
infile ZIPFILE(fname001.csv)
       delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2;
informat a b c ;
format a b c;
input a b c;
run;
 

However, sometimes when I am importing large CSV files (~4.5m rows, >8GB in without zipping), I encountered I/O error.

 

 

After trial and error, I noticed that I can successfully import using the above code by reducing the observation read in by 1 row, using the OBS= option after FIRSTOBS=2;

 

This brought me two questions:

  1. Why was there an I/O error only when the last row included?
  2. How can I know the total number of rows in the CSV file so that I can skip the last row when building an import macro?

Thanks.

Super Contributor
Posts: 440

Re: How to get number of rows in zipped CSV before importing in 9.2

Have you tried this method from @ChrisHemedinger of extracting the zipped file first https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-the-data-from-zipped-csv-file/td-p/284... and then running Proc Import?

Contributor
Posts: 22

Re: How to get number of rows in zipped CSV before importing in 9.2

Posted in reply to ChrisBrooks
Won't work. v9.2 do not have FILENAME ZIP method.
Respected Advisor
Posts: 4,173

Re: How to get number of rows in zipped CSV before importing in 9.2

@J_CKY

You can't know the number of rows (=the number of line delimiters) of a text file before you've fully scanned it. The number of rows is not a metadata attribute of such a file.

 

Before the introduction of the ZIP engine I've always used 3rd party tools (preferrably 7-zip; calling it via a SAS pipe) when dealing with zip files in SAS. The SASZIPAM engine has some limitations and I've always managed to avoid dealing with it.

Contributor
Posts: 22

Re: How to get number of rows in zipped CSV before importing in 9.2

Thanks for your input.
Unfortunately, I do not have access to 7-zip at work and red-tapes here prevents me from getting it.

I would have used 7zip method if I have the choice. :-(
Community Manager
Posts: 2,954

Re: How to get number of rows in zipped CSV before importing in 9.2

The SASZIPAM engine is not something that's fully documented/supported for customer use.  The FILENAME ZIP method is pretty good, supports ZIP files (and GZIP very soon now - in SAS 9.4 Maint 5) and member-level access.  But behind the scenes, the file will need to be extracted from the ZIP in order for SAS to know anything about it, including record count -- and that's going to require whatever amount of space is needed to store the file.

 

FILENAME ZIP is available only in SAS 9.4 and later though -- not in SAS 9.2.  You can play with it in the SAS University Edition to learn more, though that won't help in your production environment.

Super Contributor
Posts: 440

Re: How to get number of rows in zipped CSV before importing in 9.2

Ok your post says this happens sometimes - could it be that you're running out of space - I suspect that reading zip files must involve the creation of some utility files behind the scenes and if you try with a fresh SAS session it might work?

Contributor
Posts: 22

Re: How to get number of rows in zipped CSV before importing in 9.2

Posted in reply to ChrisBrooks

yes, sometimes only.  And also those larger CSV files

 

I suspect that was caused by some dodgy footer or something of that sort.  If I can get the number of row in advance, I can program it to avoid the last row.

 

But seems like this is not possible without an attempt to import it (and fail) and then find out in the log.

Respected Advisor
Posts: 4,173

Re: How to get number of rows in zipped CSV before importing in 9.2

@J_CKY

I would assume if you can run something like below to get the count of lines then the file is structurally o.k.

data _null_;
  infile .... end=last;
  input;
  count+1;
  if last then
    call symput('count',count);
run;
%put &=count;

If above fails then yes, something more serious is not right.

Contributor
Posts: 22

Re: How to get number of rows in zipped CSV before importing in 9.2

So @Patrick's way works with a smaller file (17k rows) but getting an I/O error when working with a 4.5m row file.

 

ERROR: Invalid compressed data, length does not match.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program.
       Aborted during the EXECUTION phase.

Hmmm... I guess that's something to do with the buffer?

 

 

Not sure.

Respected Advisor
Posts: 4,173

Re: How to get number of rows in zipped CSV before importing in 9.2

[ Edited ]

@J_CKY

I could replicate the error in my environment (local SAS9.4 Win7 instance) using below code.

filename testzip zip 'c:\test\testfile.zip';
data _null_;
  file testzip(testfile.txt);
  array vars {5} $100 (5*'  adslkfasdlfkas;dfkjd daflk;jadsfkdjdk jdlkaj;dfadk;;ldsaf |');
  do i=1 to 45*10**6;
    put i '|' vars[*];
  end;
run;
filename testzip clear;


filename zipfile saszipam "c:\test\testfile.zip";
data _null_;
  infile zipfile(testfile.txt) end=last;
  input;
  count+1;
  if last then
    call symput('count',put(count,best32.));
run;
filename zipfile clear;
%put &=count;
NOTE: The infile ZIPFILE(testfile.txt) is:
      File Name=testfile.txt,
      Compressed Size=148520385,
      Uncompressed Size=4294967295,
      Compression Level=-1,Clear Text=Yes

ERROR: Invalid compressed data, length does not match.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program.  Aborted during the EXECUTION phase.
NOTE: A total of 45000000 records were read from the infile library ZIPFILE.
NOTE: 45000000 records were read from the infile ZIPFILE(testfile.txt).
      One or more lines were truncated.
NOTE: The SAS System stopped processing this step because of errors.

As you I'm only running into this issue once the zipped file exceeds a certain size.

 

 

I had no problems reading the same file using the SAS9.4 ZIP engine with code as below.

filename testzip zip 'c:\test\testfile.zip';
data _null_;
  infile testzip(testfile.txt) end=last;
  input;
  count+1;
  if last then
    call symput('count',put(count,best32.));
run;
filename testzip clear;
%put &=count;

Given that the SASZIPAM engine is not meant for public use I guess you will either have to use a 3rd party tool like 7-zip or you need to upgrade to a more recent version of SAS.

 

7-zip comes also in a portable version (also the command line bit) so I guess you could make this work in your environment even if you don't have the permissions to install software.

Super User
Super User
Posts: 7,050

Re: How to get number of rows in zipped CSV before importing in 9.2

Sounds like you have hit the limit of the ability of that filename egine and your work around is just going to ignore the end of the file.

You either need to update your SAS version (why are you still using a program that is over 7 years old when newer versions are available for no extra money?) or switch to using ZIP ro 7z or other utility that can handle those files.

Ask a Question
Discussion stats
  • 11 replies
  • 177 views
  • 3 likes
  • 5 in conversation