How to work with ZIP files in SAS programs
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
You can use the FILENAME ZIP method to read and write ZIP archive files within your SAS code. You can also read and create .gz files (gzip) by using FILENAME ZIP with GZIP option. In this article you will learn how this works, as demonstrated in a series of examples.
Reading a text file from within a ZIP archive
A ZIP archive can contain one or multiple files, optionally organized in a folder structure. To address and read a single member from the ZIP file, you can use folder-member syntax like this:
filename inzip ZIP "./projects/freddiemac.zip";
data fm;
/* Read text file directly from ZIP archive */
infile inzip(ri130701_13dn01.txt);
input @1 record_type $2. @;
/* continue processing */
run;
Alternatively, you can use the MEMBER= option on the FILENAME ZIP statement:
filename inzip ZIP "./projects/freddiemac.zip"
member="ri130701_13dn01.txt";
data fm;
/* Read text file directly from ZIP archive */
infile inzip;
input @1 record_type $2. @;
/* continue processing */
run;
Discover the contents of a ZIP file with DOPEN and DREAD
You can think of a ZIP archive as a folder that contains other files and folders in a hierarchy. In this way it makes sense that you can navigate the ZIP contents by using the directory-related functions DOPEN and DREAD.
filename zipdemo ZIP "&ziproot./zipdemo.zip";
/* List the files in the ZIP */
/* Output to log */
data _null_;
fid=dopen("zipdemo");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
put memname=;
end;
rc=dclose(fid);
run;
Sample output:
memname=class.csv
memname=SciFi-AI.csv
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
cpu time 0.07 seconds
Variation using an XLSX file
Modern Excel files (XLSX) use the ZIP format under the covers, and you can explore the structure with FILENAME ZIP:
filename titanic ZIP "&ziproot./titanic-full.xlsx";
/* List the files in the ZIP */
/* Output to log */
data _null_;
fid=dopen("titanic");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
put memname=;
end;
rc=dclose(fid);
run;
Result, including subfolder names within the XLSX zip structure:
memname=[Content_Types].xml
memname=_rels/.rels
memname=xl/workbook.xml
memname=xl/_rels/workbook.xml.rels
memname=xl/worksheets/sheet1.xml
memname=xl/worksheets/sheet2.xml
memname=xl/theme/theme1.xml
memname=xl/styles.xml
memname=xl/sharedStrings.xml
memname=xl/drawings/drawing1.xml
memname=xl/media/image1.png
memname=xl/webextensions/taskpanes.xml
memname=xl/webextensions/webextension1.xml
memname=xl/worksheets/_rels/sheet2.xml.rels
memname=xl/drawings/_rels/drawing1.xml.rels
memname=xl/webextensions/_rels/taskpanes.xml.rels
memname=docProps/core.xml
memname=docProps/app.xml
memname=xl/worksheets/sheet3.xml
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Copy content out of a ZIP file using the FCOPY function
You can use the FCOPY function to copy a member file out of a ZIP to another folder in your SAS session. FCOPY requires two filerefs:
filename zipdemo ZIP "&ziproot./zipdemo.zip" member='SciFi-AI.csv';
filename scifi "&ziproot./data/SciFi-AI.csv";
data _null_;
rc=fcopy('zipdemo','scifi');
run;
but remember that you don't need to copy a text file out in order to read it with DATA step:
filename zipdemo ZIP "&ziproot./zipdemo.zip";
data scifi;
/* Read directly just like a normal CSV */
infile zipdemo(SciFi-AI.csv) dsd firstobs=2;
length title $ 20 year 8 cost 8 boxoffice 8;
input title year cost boxoffice;
run;
proc print data=scifi (obs=10);
run;
Working with gzip files in SAS
filename source "&ziproot./dailylog.txt";
filename tozip ZIP "&ziproot./data/dailylog.txt.gz" GZIP;
filename tozip2 ZIP "&ziproot./data/dailylog2.txt.gz" GZIP;
/* read and rewrite text */
data _null_;
infile source;
file tozip ;
input;
put _infile_ ;
run;
/* OR, use FCOPY */
data _null_;
rc=fcopy('source','tozip2');
run;
For gzipped text files, you can use DATA step to "read" directly from the .gz file:
filename fromzip ZIP "./projects/dailylog_20230821.txt.gz" GZIP;
data logdata;
/* read directly from compressed file */
infile fromzip;
input date : yymmdd10. time : anydttme. ;
format date date9. time timeampm.;
run;
Using SAS functions to get the file details from a ZIP file
Use FOPEN, FOPTNUM, FOPTNAME and FINFO to learn the specific ZIP member properties such as name, original file size, compressed size, and original date/time.
FILENAME F ZIP "C:\Users\sascrh\Downloads\Zillow_Neighborhoods.zip"
member="Zillow_Neighborhoods.mxd";
data deets;
fId = fopen("f","S");
if fID then
do;
infonum=foptnum(fid);
do i=1 to infonum;
infoname=foptname(fid,i);
select (infoname);
when ('Filename') filename=finfo(fid,infoname);
when ('Member Name') membername=finfo(fid,infoname);
when ('Size') filesize=input(finfo(fid,infoname),15.);
when ('Compressed Size') compressedsize=input(finfo(fid,infoname),15.);
when ('CRC-32') crc32=finfo(fid,infoname);
when ('Date/Time') filetime=input(finfo(fid,infoname),anydtdtm.);
end;
end;
compressedratio = compressedsize / filesize;
output;
end;
fId = fClose( fId );
run;
For a complete example and helpful SAS macros, see Using FILENAME ZIP and FINFO to list the details in your ZIP files.
Special notes about working with ZIP files in SAS
- Only ZIP and gzip formats are supported. Not 7z, tar, or other compressed formats.
- FILENAME ZIP does not support encrypted or password-protected files.
- The FCOPY function, when used to copy a file from a ZIP file to a folder, does not preserve the original file attributes such as datetime stamp.
- Use caution with encoding. Note that FILENAME ZIP supports a NAMEENCODING option that allows you work with file members that use named that don't match with your SAS session encoding.
See also
Series of "ZIP files" articles on blogs.sas.com
How do I read and write ZIP files in SAS? Ask the Expert webinar