SAS Communities Library

We’re smarter together. Learn from this collection of community knowledge and add your expertise.
BookmarkSubscribeRSS Feed

How to work with ZIP files in SAS programs

Started 3 weeks ago by
Modified 3 weeks ago by
Views 562

You can use the FILENAME ZIP method to read and write ZIP archive files within your SAS code. You can also read and create .gz files (gzip) by using FILENAME ZIP with GZIP option. In this article you will learn how this works, as demonstrated in a series of examples. 

 

Reading a text file from within a ZIP archive

A ZIP archive can contain one or multiple files, optionally organized in a folder structure. To address and read a single member from the ZIP file, you can use folder-member syntax like this:

filename inzip ZIP "./projects/freddiemac.zip";

data fm;
  /* Read text file directly from ZIP archive */
  infile inzip(ri130701_13dn01.txt);
  input @1 record_type $2. @;
  /* continue processing */
run;

Alternatively, you can use the MEMBER= option on the FILENAME ZIP statement:

filename inzip ZIP "./projects/freddiemac.zip"
 member="ri130701_13dn01.txt";

data fm;
  /* Read text file directly from ZIP archive */
  infile inzip;
  input @1 record_type $2. @;
  /* continue processing */
run;

 

Discover the contents of a ZIP file with DOPEN and DREAD

You can think of a ZIP archive as a folder that contains other files and folders in a hierarchy. In this way it makes sense that you can navigate the ZIP contents by using the directory-related functions DOPEN and DREAD.

 

filename zipdemo ZIP "&ziproot./zipdemo.zip";

/* List the files in the ZIP */
/* Output to log             */
data _null_;
  fid=dopen("zipdemo");

  if fid=0 then
    stop;
  memcount=dnum(fid);

  do i=1 to memcount;
    memname=dread(fid,i);
    put memname=;
  end;

  rc=dclose(fid);
run;

 

Sample output:

memname=class.csv
memname=SciFi-AI.csv
NOTE: DATA statement used (Total process time):
      real time           0.12 seconds
      cpu time            0.07 seconds

 

Variation using an XLSX file

Modern Excel files (XLSX) use the ZIP format under the covers, and you can explore the structure with FILENAME ZIP:

filename titanic ZIP "&ziproot./titanic-full.xlsx";

/* List the files in the ZIP */
/* Output to log             */
data _null_;
  fid=dopen("titanic");

  if fid=0 then
    stop;
  memcount=dnum(fid);

  do i=1 to memcount;
    memname=dread(fid,i);
    put memname=;
  end;

  rc=dclose(fid);
run;

 

Result, including subfolder names within the XLSX zip structure:

memname=[Content_Types].xml
memname=_rels/.rels
memname=xl/workbook.xml
memname=xl/_rels/workbook.xml.rels
memname=xl/worksheets/sheet1.xml
memname=xl/worksheets/sheet2.xml
memname=xl/theme/theme1.xml
memname=xl/styles.xml
memname=xl/sharedStrings.xml
memname=xl/drawings/drawing1.xml
memname=xl/media/image1.png
memname=xl/webextensions/taskpanes.xml
memname=xl/webextensions/webextension1.xml
memname=xl/worksheets/_rels/sheet2.xml.rels
memname=xl/drawings/_rels/drawing1.xml.rels
memname=xl/webextensions/_rels/taskpanes.xml.rels
memname=docProps/core.xml
memname=docProps/app.xml
memname=xl/worksheets/sheet3.xml
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

 

Copy content out of a ZIP file using the FCOPY function

You can use the FCOPY function to copy a member file out of a ZIP to another folder in your SAS session. FCOPY requires two filerefs: 

filename zipdemo ZIP "&ziproot./zipdemo.zip" member='SciFi-AI.csv';
filename scifi "&ziproot./data/SciFi-AI.csv";

data _null_;
    rc=fcopy('zipdemo','scifi');
run;

 

but remember that you don't need to copy a text file out in order to read it with DATA step:

filename zipdemo ZIP "&ziproot./zipdemo.zip";
data scifi;
 /* Read directly just like a normal CSV */
 infile zipdemo(SciFi-AI.csv) dsd firstobs=2;
 length title $ 20 year 8 cost 8 boxoffice 8;
 input title year cost boxoffice;
run;
proc print data=scifi (obs=10);
run;

 

Working with gzip files in SAS

A .gz file (gzipped file) is a single file that has been compressed using the gzip algorithm, often on a Linux/Unix platform. Use FILENAME ZIP with GZIP option to compress a single file to a gz file.
filename source "&ziproot./dailylog.txt";
filename tozip ZIP "&ziproot./data/dailylog.txt.gz" GZIP;
filename tozip2 ZIP "&ziproot./data/dailylog2.txt.gz" GZIP;
 
 
/* read and rewrite text */
data _null_;   
   infile source;
   file tozip ;
   input;
   put _infile_ ;
run;

/* OR, use FCOPY */
data _null_;   
    rc=fcopy('source','tozip2');
run;

For gzipped text files, you can use DATA step to "read" directly from the .gz file:

filename fromzip ZIP "./projects/dailylog_20230821.txt.gz" GZIP;
data logdata;   
   /* read directly from compressed file */
   infile fromzip; 
   input  date : yymmdd10. time : anydttme. ;
   format date date9. time timeampm.;
run;

 

Using SAS functions to get the file details from a ZIP file

Use FOPEN, FOPTNUM, FOPTNAME and FINFO to learn the specific ZIP member properties such as name, original file size, compressed size, and original date/time. 

FILENAME F ZIP "C:\Users\sascrh\Downloads\Zillow_Neighborhoods.zip" 
  member="Zillow_Neighborhoods.mxd";

data deets;
  fId = fopen("f","S");
  if fID then
    do;
      infonum=foptnum(fid);
      do i=1 to infonum;
        infoname=foptname(fid,i);
        select (infoname);
          when ('Filename') filename=finfo(fid,infoname);
          when ('Member Name') membername=finfo(fid,infoname);
          when ('Size') filesize=input(finfo(fid,infoname),15.);
          when ('Compressed Size') compressedsize=input(finfo(fid,infoname),15.);
          when ('CRC-32') crc32=finfo(fid,infoname);
          when ('Date/Time') filetime=input(finfo(fid,infoname),anydtdtm.);
        end;
      end;
      compressedratio = compressedsize / filesize;
      output;
    end;
  fId = fClose( fId );
run;

For a complete example and helpful SAS macros, see Using FILENAME ZIP and FINFO to list the details in your ZIP files.

 

Special notes about working with ZIP files in SAS

  • Only ZIP and gzip formats are supported. Not 7z, tar, or other compressed formats.
  • FILENAME ZIP does not support encrypted or password-protected files.
  • The FCOPY function, when used to copy a file from a ZIP file to a folder, does not preserve the original file attributes such as datetime stamp.
  • Use caution with encoding. Note that FILENAME ZIP supports a NAMEENCODING option that allows you work with file members that use named that don't match with your SAS session encoding.

See also

Series of "ZIP files" articles on blogs.sas.com

How do I read and write ZIP files in SAS? Ask the Expert webinar

 

 

Version history
Last update:
3 weeks ago
Updated by:
Contributors

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Labels
Article Tags