BookmarkSubscribeRSS Feed
jimbarbour
Meteorite | Level 14

@ChrisNZ ,

 

Just to make things more interesting, I modified your command slightly as follows:

cls | dir "..\..\*.*" /s | find " "

This may not have been the world's smartest thing to do -- my drive is a 12 Tb drive.  The output scrolled by for what seemed like an eternity, but, in the end, it completed without issue.  At least with this particular means of testing, there doesn't seem to be an issue at the OS level.

 

I won't have time today, but perhaps tomorrow I can write up the issue and send it to SAS.

 

Jim

 

 

 

@Patrick ,

 

Thank you for that interesting bit of ODS code.  I was aware of the Zip Libname engine in SAS, but I was not aware of the "package()" functionality in ODS.  I may have to play with that.  If it were faster than wzzip.exe, then perhaps it would be a good alternative.

 

Now, if I really wanted to make zipping an entire directory fast, what I would do is to switch from the X command to a SYSTASK with a NOWAIT parameter.  Each file in the directory would be assigned a process name and a process specific return code macro variable, and then, after all SYSTASKs had submitted their iterations of wzzip.exe, I would code a WAITFOR _ALL_.  Once the wait period is over, all return codes would be interrogated (and handled appropriately).  Since all files would be being zipped concurrently, this would be a very fast way to zip an entire directory.  It would take a bit of work to do it, but boy it would be fast

 

I've got most of the code written in other programs, I'd just have to assemble the thing.  Of course parallel processing can be very tricky to debug, but, once debugged, the speed advantage would be truly outstanding as compared to serial processing.

 

Jim

ChrisNZ
Tourmaline | Level 20

>At least with this particular means of testing, there doesn't seem to be an issue at the OS level.

I'd replace the dir command with the zip command to compare apples to apples.

 

>Since all files would be being zipped concurrently, this would be a very fast way to zip an entire directory

Random I/Os scattered over many files may actually slow down the zipping, depending on whether the bottleneck is disk or CPU.

Or there might be a golden number of concurrent zips to get the balance right. Just saying.

 

 

jimbarbour
Meteorite | Level 14

@ChrisNZ ,

 

I ran the following command on a dataset that failed in my first version of my FILNAME PIPE code (before I redirected STDERR to a file.

"E:\Program Files\WinZip\wzzip.exe" -a "I:\Commercial\monthly_data\Cornerstone\archive\Stg\Rx_Archive_2020-03-05_0107.zip" "I:\Commercial\monthly_data\Cornerstone\rx.sas7bdat"2>&1 | find " "

and it worked just fine.  The pipe caused no problems.  Note that I redirected STDERR to STDOUT, so both STDERR and STDOUT should be piped to the find command.

 

Good point about not submitting too many wzzip.exe jobs at a time.  I tend to stick with three at a time, although I'm not sure there's any real magic in three.

 

Jim

ChrisNZ
Tourmaline | Level 20

Well it looks like you have all the data for SAS to replicate this strange limitation, and create a defect or at least a UN.

Patrick
Opal | Level 21

The main reason I'd go for ODS package or the zip engine if there isn't a good reason to do otherwise is the removal of dependency on 3rd party tools. It also makes the code more OS independent (like for a migration from a Windows to a Linux server) and if there are any issues then I can contact a single vendor for support.

 

I guess your %systask() / rsubmit approach would also work for SAS code (bit more resource consumption though). I would in any case base the number of child tasks you create in parallel on available CPU - and then loop using waitfor _any_ to start new tasks once existing ones finish.

 

If you've got SAS DI Studio then the Loop Transformation generated SAS code (if execute in parallel ticked) does this using rsubmit.

 

BTW: Just found here a ready made SAS macro for zipping all SAS datasets in a folder.

 

jimbarbour
Meteorite | Level 14

@Patrick ,

 

Hmmm.  That's a very good point about eliminating third party software.  I had not considered that.  I'll have to mull that one over.

 

That's an interesting zip the directory macro.  I have something similar although I just pipe the results of a dir command into my program.  I think piping the results in from a dir command is just a bit simpler than fileref, dopen, etc.

 

Here's the core of my macro (below).  This macro then feeds each file name from the dir command to the macro that does the actual zipping.

 

	%cd(&Data_Dir);

	FILENAME	Dir_List	PIPE	"dir *.sas* /b /os";

DATA	_NULL_;
		LENGTH	File_Name	$256;
		LENGTH	Command		$256;

		INFILE	Dir_List;
		INPUT	File_Name;

&Cmnt	PUTLOG	"&Nte2  "	_N_ =	' '	File_Name;

		Command					=	CATS('%Archive_Data(', "&Data_Dir.,", File_Name, 
										",&Arch_Dir.&Zip_SubFolder, &Zip_File_Name, ",
										"Debug=&Debug, Width=&Width, Xmin=&Xmin)");

&Cmnt	PUTLOG	"&Nte2  "	Command=;

		IF	UPCASE("&Run_Mode")	=	'PROD'	THEN
			DO;
				CALL	EXECUTE(Command);
			END;
	RUN;
	%Check_RC(MsgLvl=&MsgLvl, ErrLvl=&ErrLvl);
	%Reset_RC;

Jim

 

Tom
Super User Tom
Super User

Why not just tell WinZip not to write that gibberish?  

What options does it have?  Most of those tools have quiet option.  If you must see what is in the file use a second command to ask for the file contents.

 

SAS does not have a problem reading long lines, just use RECFM=N and treat it as a stream of bytes.  But perhaps there is some interaction with PIPE and long lines?

jimbarbour
Meteorite | Level 14

@Tom ,

 

I originally looked for a "quiet" option but was unable to find one for Wzzip.exe.  In my last shop, the zip software that we used there had a -qq option that suppressed just about all messages which was really useful for batch mode.  Wzzip.exe does have a -ybc option which basically specifies that any prompts should be handled as an "OK" or "Yes", but there's no "quiet" option that I've been able to find.

 

That's an interesting idea to try to handle the STDERR as a stream of bytes.  Unfortunately, SAS redirects the output from STDERR directly to the log when a PIPE is used, and one cannot specify anything like a RECFM.

 

Having the gibberish written to a text file isn't a big deal and seems to be the best way to deal with the situation.  I think I'll stick with that unless I can find a "quiet" mode.

 

Jim

Patrick
Opal | Level 21

Below some sample code which uses ODS Package for zipping.

Source file test is around 12GB. 

Capture.JPG

ODS package requires less than 2 minutes on my not very powerful server (9.04.01M5P091317 under RHEL).

Capture.JPG

 

data test(drop=_: compress=no);
  length var $ 1024;
  var=repeat(' A BB C ',100);
  _n_row=1024 * 1024 * 12;
  do _i=1 to _n_row;
    output;
  end;
  stop;
run;

%put Start: %sysfunc(datetime(), datetime21.);
/* Creating a ZIP file with ODS PACKAGE */
ods package(newzip) open nopf;
ods package(newzip) add
      file="%sysfunc(pathname(work))/test.sas7bdat" 
      path="data/";
ods package(newzip) publish archive 
  properties(
   archive_name="test2.zip" 
   archive_path="~"
  );
ods package(newzip) close;
%put Stop: %sysfunc(datetime(), datetime21.);

proc datasets lib=work nolist nowarn;
  delete test;
  run;
quit;

 

Update: And I've run above now also for a 15GB file. Zipping took in my environment 2min 8 seconds.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 38 replies
  • 1753 views
  • 9 likes
  • 5 in conversation