Hello.
I am a newbie to SAS programming and I need a help on unzipping the gz and zip files using SAS.
I have a .zip file which contains several .csv.gz files within it.
I could find how to unzip .zip files and how to open .gz files, but it seems quite complicated to do both at once.
what I need is to create a new unzipped files of .csv files. (or, just reading as datasets would be helpful too)
SAS 9.4 has a native capability to read ZIP files using FILENAME ZIP. And it if you have SAS 9.4 Maint 5, the method also supports GZIP files.
Because you have a ZIP of GZ files, you're looking at a two step process. First, expand the members of the ZIP file with FILENAME ZIP. I have some examples in this blog post. Note that the process involves "reading" each ZIP member and writing it as a new file to a temporary space -- effectively copying it out of the ZIP archive. At the end of this step, you would have a collection of *.csv.gz files in your temp space.
You would then use FILENAME ZIP GZIP to reference each of those csv.gz files in turn. You don't need to explicitly decompress those -- once you assign that fileref, you should be able to process the file with DATA step. See this blog post about reading GZIP files.
Why do you have two levels of compression in the first place? Sounds like a zip from linux has then been moved over to windows, where someone else has then zipped them, bit daft.
Anyways, most zip programs, winzip, 7zip should be able to deal with both file types. So you would need to:
1) Get filenames
2) Unzip each filename using command line extract
3) Get list of files
4) Unzip each filename using command line extract
A shell of a program might look like:
filename tmp pipe 'dir "c:/data/*.zip" /b'; data _null_; infile tmp; length nm $200; input nm $; call execute(cat('x "c:/programfiles/7zip/7zip.exe -e "',strip(nm),'";')); run; filename tmp pipe 'dir "c:/data/*.gz" /b'; data _null_; infile tmp; length nm $200; input nm $; call execute(cat('x "c:/programfiles/7zip/7zip.exe -e "',strip(nm),'";')); run;
Question is, is it worth coding this? You can select all files, right click and select extract to /* with 7zip for instance, its not really a huge effort.
SAS 9.4 has a native capability to read ZIP files using FILENAME ZIP. And it if you have SAS 9.4 Maint 5, the method also supports GZIP files.
Because you have a ZIP of GZ files, you're looking at a two step process. First, expand the members of the ZIP file with FILENAME ZIP. I have some examples in this blog post. Note that the process involves "reading" each ZIP member and writing it as a new file to a temporary space -- effectively copying it out of the ZIP archive. At the end of this step, you would have a collection of *.csv.gz files in your temp space.
You would then use FILENAME ZIP GZIP to reference each of those csv.gz files in turn. You don't need to explicitly decompress those -- once you assign that fileref, you should be able to process the file with DATA step. See this blog post about reading GZIP files.
Then your option is to a) use x commands to shell the commands out to the operating system as I present above, or do it outside of SAS. No reason why you cannot do it via normal batch file:
https://stackoverflow.com/questions/17077964/windows-batch-script-to-unzip-files-in-a-directory
Not all processing needs to be done in SAS.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.