I have a .tar.gz to be separated. The following code downloads the file and removes the .gz part.
filename dmrs "%sysfunc(getoption(work))\dmrs.tar.gz";
proc http url="http://www.kentdaniel.net/data/dmrs/dmrs.tar.gz" out=dmrs;
run;
filename dmrs zip "%sysfunc(getoption(work))\dmrs.tar.gz" gzip;
filename dmrs0 "%sysfunc(getoption(work))\dmrs.tar";
data _null_;
infile dmrs recfm=n;
file dmrs0 recfm=n;
input;
put _infile_;
run;
filename dmrs0 zip "%sysfunc(getoption(work))\dmrs.tar";
%let i=%sysfunc(dopen(dmrs0));
%put %sysfunc(dnum(&i.));
%let j=%sysfunc(dclose(&i.));
This code successfully unzips the .tar.gz but is unsuccessful in separating each member inside the .tar. Can ZIP in FILENAME read a .tar as well? I am reading this .gz-related blog post, but a bit unsure. Thanks.
A tar file is a bundling of files, but not compressed. The gzip action compresses this single bundle. So in concept, tar plus gzip is like creating a zip file, which is an archive of compressed files.
But zip is not the same as tar or tar.gz, so the FILENAME ZIP method can't uncompress these. I think you'll have to use the operating system tar -xf command to do that. I'm not aware of any SAS function or method to perform that step.
Good tip from @Kurt_Bremser :
The proper method to deal with a .tar.gz file is this:
gzip -dc file.tar.gz|tar -xf -The first part decompresses the file to the initial tar stream and sends it to stdout, the second extracts all files from the stdin stream and writes them to the current working directory. By using other tar options, you can extract the names of the files in the tar, or extract specific files from the stream.
"tar" is short for "tape archive", this utility was basically designed for sequential tape storage.
A tar file is a bundling of files, but not compressed. The gzip action compresses this single bundle. So in concept, tar plus gzip is like creating a zip file, which is an archive of compressed files.
But zip is not the same as tar or tar.gz, so the FILENAME ZIP method can't uncompress these. I think you'll have to use the operating system tar -xf command to do that. I'm not aware of any SAS function or method to perform that step.
Good tip from @Kurt_Bremser :
The proper method to deal with a .tar.gz file is this:
gzip -dc file.tar.gz|tar -xf -The first part decompresses the file to the initial tar stream and sends it to stdout, the second extracts all files from the stdin stream and writes them to the current working directory. By using other tar options, you can extract the names of the files in the tar, or extract specific files from the stream.
"tar" is short for "tape archive", this utility was basically designed for sequential tape storage.
This code (1) downloads the dmrs.tar.gz file (which contains just a dmrs.tar file), (2) removes the .gz part inflating the downloaded file. It seems SAS does not support the last piece of the puzzle—unbinding the members inside the .tar—yet. Thank you for this quick response.
The proper method to deal with a .tar.gz file is this:
gzip -dc file.tar.gz|tar -xf -
The first part decompresses the file to the initial tar stream and sends it to stdout, the second extracts all files from the stdin stream and writes them to the current working directory. By using other tar options, you can extract the names of the files in the tar, or extract specific files from the stream.
"tar" is short for "tape archive", this utility was basically designed for sequential tape storage.
I realize this thread is a bit old but I just noticed it and wanted to weigh in with two points.
A compressed tar file (typically with ".tar.gz" or ".tgz" extension) is the result of two operations - creating a single-file archive from a group of files (the "tar" part) and compressing that archive (the "gzip" part). The tar utility provided in most variants of *nix include the ability to compress/decompress as part of the process. This can be accomplished on the creation of a tar file by including the "z" switch, e.g., tar -cvzf <tar-filename> <files-to-include>. The same "z" switch will work when extracting one or more files. This means that there is no need to uncompress the file before using the tar command - you can combine those steps and gain the benefit of smaller disk use for the compressed tar file.
Secondly, using PIPE on a FILENAME statement, you can read an individual CSV or other file from a compressed tar file without ever having to extract anything to disk as an intermediate. Here's an example that would allow one to read a single CSV file from a tarball into a SAS dataset using the "O" switch (that's a capital letter "O") to write the extract results to STDOUT (aka, "the console" instead of a file):
FILENAME mycsv PIPE "tar -xzOf /mypath/mytarball.tar.gz ./one.csv";
DATA WORK.ONE;
INFILE mycsv DELIMITER=',' MISSOVER DSD FIRSTOBS=2;
INPUT...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.