BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

I have a .tar.gz to be separated. The following code downloads the file and removes the .gz part.

filename dmrs "%sysfunc(getoption(work))\dmrs.tar.gz";

proc http url="http://www.kentdaniel.net/data/dmrs/dmrs.tar.gz" out=dmrs;
run;

filename dmrs zip "%sysfunc(getoption(work))\dmrs.tar.gz" gzip;
filename dmrs0 "%sysfunc(getoption(work))\dmrs.tar";

data _null_;
	infile dmrs recfm=n;
	file dmrs0 recfm=n;
	input;
	put _infile_;
run;

filename dmrs0 zip "%sysfunc(getoption(work))\dmrs.tar";
%let i=%sysfunc(dopen(dmrs0));
%put %sysfunc(dnum(&i.));
%let j=%sysfunc(dclose(&i.));

This code successfully unzips the .tar.gz but is unsuccessful in separating each member inside the .tar. Can ZIP in FILENAME read a .tar as well? I am reading this .gz-related blog post, but a bit unsure. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisHemedinger
Community Manager

A tar file is a bundling of files, but not compressed.  The gzip action compresses this single bundle.  So in concept, tar plus gzip is like creating a zip file, which is an archive of compressed files.

 

But zip is not the same as tar or tar.gz, so the FILENAME ZIP method can't uncompress these.  I think you'll have to use the operating system tar -xf command to do that.  I'm not aware of any SAS function or method to perform that step.

 

Good tip from @Kurt_Bremser :

The proper method to deal with a .tar.gz file is this:

gzip -dc file.tar.gz|tar -xf -

The first part decompresses the file to the initial tar stream and sends it to stdout, the second extracts all files from the stdin stream and writes them to the current working directory. By using other tar options, you can extract the names of the files in the tar, or extract specific files from the stream.

 

"tar" is short for "tape archive", this utility was basically designed for sequential tape storage.

SAS For Dummies 3rd Edition! Check out the new edition, covering SAS 9.4, SAS Viya, and all of the modern ways to use SAS!

View solution in original post

4 REPLIES 4
ChrisHemedinger
Community Manager

A tar file is a bundling of files, but not compressed.  The gzip action compresses this single bundle.  So in concept, tar plus gzip is like creating a zip file, which is an archive of compressed files.

 

But zip is not the same as tar or tar.gz, so the FILENAME ZIP method can't uncompress these.  I think you'll have to use the operating system tar -xf command to do that.  I'm not aware of any SAS function or method to perform that step.

 

Good tip from @Kurt_Bremser :

The proper method to deal with a .tar.gz file is this:

gzip -dc file.tar.gz|tar -xf -

The first part decompresses the file to the initial tar stream and sends it to stdout, the second extracts all files from the stdin stream and writes them to the current working directory. By using other tar options, you can extract the names of the files in the tar, or extract specific files from the stream.

 

"tar" is short for "tape archive", this utility was basically designed for sequential tape storage.

SAS For Dummies 3rd Edition! Check out the new edition, covering SAS 9.4, SAS Viya, and all of the modern ways to use SAS!
Junyong
Pyrite | Level 9

This code (1) downloads the dmrs.tar.gz file (which contains just a dmrs.tar file), (2) removes the .gz part inflating the downloaded file. It seems SAS does not support the last piece of the puzzle—unbinding the members inside the .tar—yet. Thank you for this quick response.

Kurt_Bremser
Super User

The proper method to deal with a .tar.gz file is this:

gzip -dc file.tar.gz|tar -xf -

The first part decompresses the file to the initial tar stream and sends it to stdout, the second extracts all files from the stdin stream and writes them to the current working directory. By using other tar options, you can extract the names of the files in the tar, or extract specific files from the stream.

 

"tar" is short for "tape archive", this utility was basically designed for sequential tape storage.

srobc
SAS Employee

I realize this thread is a bit old but I just noticed it and wanted to weigh in with two points.

 

A compressed tar file (typically with ".tar.gz" or ".tgz" extension) is the result of two operations - creating a single-file archive from a group of files (the "tar" part) and compressing that archive (the "gzip" part). The tar utility provided in most variants of *nix include the ability to compress/decompress as part of the process. This can be accomplished on the creation of a tar file by including the "z" switch, e.g., tar -cvzf <tar-filename> <files-to-include>. The same "z" switch will work when extracting one or more files. This means that there is no need to uncompress the file before using the tar command - you can combine those steps and gain the benefit of smaller disk use for the compressed tar file.

 

Secondly, using PIPE on a FILENAME statement,  you can read an individual CSV or other file from a compressed tar file without ever having to extract anything to disk as an intermediate. Here's an example that would allow one to read a single CSV file from a tarball into a SAS dataset using the "O" switch (that's a capital letter "O") to write the extract results to STDOUT (aka, "the console" instead of a file):

FILENAME mycsv PIPE "tar -xzOf /mypath/mytarball.tar.gz ./one.csv";

DATA WORK.ONE;

  INFILE mycsv DELIMITER=',' MISSOVER DSD FIRSTOBS=2;

  INPUT...

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 4946 views
  • 2 likes
  • 4 in conversation