Reading a subset of a data file into SAS from a zipped folder without unzipping

Reply
New Contributor
Posts: 2

Reading a subset of a data file into SAS from a zipped folder without unzipping

When I run the following code to set the zip folder location and identify members...

 

/*Setting Zip folder Location*/

filename inzip ZIP "**location hidden for security**/datasetzippedfolder.zip";

/* Read the "members" (files) from the ZIP file */
data contents(keep=memname isFolder);
length memname $200 isFolder 8;
fid=dopen("inzip");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
/* check for trailing / in folder name */
isFolder = (first(reverse(trim(memname)))='/');
output;
end;
rc=dclose(fid);
run;
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
run;

 

...I get the following:

 

memname isFolder
var/lib/jenkins/oracle_export/dataset.csv0
N = 1

 

 

The dataset.csv is size 777 PB. I do not want to extract this folder because I don't have the room. I know what columns the dataset has and what the value of the variable needs to be for the subset.

 

How can I achieve obtaining the subset without extracting?

Grand Advisor
Posts: 10,210

Re: Reading a subset of a data file into SAS from a zipped folder without unzipping

You would use the csv file name as the member argument in a filename statement:

 

filename foo ZIP 'U:\directory1\testzip.zip' member="test1.csv" ;

 

Then in your infile statement use the fileref Foo

 

data want;

   infile foo <any options such as delimeter, lrecl and such>;

   <informat and input statement for the columns in the csv go here>

run;

 

Note that if the Zip wasn't created by WinZip this may not work.

New Contributor
Posts: 2

Re: Reading a subset of a data file into SAS from a zipped folder without unzipping

Thank you for your quick feedback @ballardw.

 

Unfortunately, I get this error:

 

NOTE: The zip file dmf_table_33.csv doesn't exist.
ERROR: Physical file does not exist, dmf_table_33.csv.

 

It seems to not recognize the zipped folder. The files are password protected. Would that make a difference?

Grand Advisor
Posts: 10,210

Re: Reading a subset of a data file into SAS from a zipped folder without unzipping


Delgoffe_MCRI wrote:

Thank you for your quick feedback @ballardw.

 

Unfortunately, I get this error:

 

NOTE: The zip file dmf_table_33.csv doesn't exist.
ERROR: Physical file does not exist, dmf_table_33.csv.

 

It seems to not recognize the zipped folder. The files are password protected. Would that make a difference?


From http://blogs.sas.com/content/sasdummy/2014/01/29/using-filename-zip/  :

For password-protected ZIP files, you'll still need to use external tools like WinZip, 7-Zip, or gzip. This SAS Global Forum paper shows how that can work.

 

 

Ask a Question
Discussion stats
  • 3 replies
  • 123 views
  • 1 like
  • 2 in conversation