- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
I am a newbie to SAS programming and I need a help on unzipping the gz and zip files using SAS.
I have a .zip file which contains several .csv.gz files within it.
I could find how to unzip .zip files and how to open .gz files, but it seems quite complicated to do both at once.
what I need is to create a new unzipped files of .csv files. (or, just reading as datasets would be helpful too)
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS 9.4 has a native capability to read ZIP files using FILENAME ZIP. And it if you have SAS 9.4 Maint 5, the method also supports GZIP files.
Because you have a ZIP of GZ files, you're looking at a two step process. First, expand the members of the ZIP file with FILENAME ZIP. I have some examples in this blog post. Note that the process involves "reading" each ZIP member and writing it as a new file to a temporary space -- effectively copying it out of the ZIP archive. At the end of this step, you would have a collection of *.csv.gz files in your temp space.
You would then use FILENAME ZIP GZIP to reference each of those csv.gz files in turn. You don't need to explicitly decompress those -- once you assign that fileref, you should be able to process the file with DATA step. See this blog post about reading GZIP files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why do you have two levels of compression in the first place? Sounds like a zip from linux has then been moved over to windows, where someone else has then zipped them, bit daft.
Anyways, most zip programs, winzip, 7zip should be able to deal with both file types. So you would need to:
1) Get filenames
2) Unzip each filename using command line extract
3) Get list of files
4) Unzip each filename using command line extract
A shell of a program might look like:
filename tmp pipe 'dir "c:/data/*.zip" /b'; data _null_; infile tmp; length nm $200; input nm $; call execute(cat('x "c:/programfiles/7zip/7zip.exe -e "',strip(nm),'";')); run; filename tmp pipe 'dir "c:/data/*.gz" /b'; data _null_; infile tmp; length nm $200; input nm $; call execute(cat('x "c:/programfiles/7zip/7zip.exe -e "',strip(nm),'";')); run;
Question is, is it worth coding this? You can select all files, right click and select extract to /* with 7zip for instance, its not really a huge effort.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS 9.4 has a native capability to read ZIP files using FILENAME ZIP. And it if you have SAS 9.4 Maint 5, the method also supports GZIP files.
Because you have a ZIP of GZ files, you're looking at a two step process. First, expand the members of the ZIP file with FILENAME ZIP. I have some examples in this blog post. Note that the process involves "reading" each ZIP member and writing it as a new file to a temporary space -- effectively copying it out of the ZIP archive. At the end of this step, you would have a collection of *.csv.gz files in your temp space.
You would then use FILENAME ZIP GZIP to reference each of those csv.gz files in turn. You don't need to explicitly decompress those -- once you assign that fileref, you should be able to process the file with DATA step. See this blog post about reading GZIP files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I can see that I need to first go through a unzipping process with filename, and when end up with bunch of csv.gz temp files, then I should move on to deal with gzip files.
But unfortunately, the only available sas that I have is sas 9.4 mount 3. And apparently, gzip option does not seem to work on this version of sas.
It has already been a great help, but can I ask for more advice on csv.gz files when I'm not using sas 9.4 mount 5?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then your option is to a) use x commands to shell the commands out to the operating system as I present above, or do it outside of SAS. No reason why you cannot do it via normal batch file:
https://stackoverflow.com/questions/17077964/windows-batch-script-to-unzip-files-in-a-directory
Not all processing needs to be done in SAS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content